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SISGLg PQLTPSPTIDB CHAIH BXHDIHG MOUCULgS 



mCXGROOHD OF THE IHVBNTIOH 

This application is a continuation-in-part of 
Application Serial No. 902,971, filed September 2, 
1986, the contents of which are herein fully incorpor- 
ated by reference. 

Field of the Invention 

The present invention relates to single polypep- 
tide chain binding molecules having the three dimen- 
sional folding, and thus the binding ability and spe- 
cificity, of the variable, region of an antibody. 
Methods of producing these molecules by genetic engin- 
eering are also disclosed. 

Description of the Background Art 

The advent of modern molecular biology and immuno- 
logy has brought about the possibility of producing 
large quantities, of biologically active materials in 
highly reproduceable form and with low cost. Briefly, 
the gene sequence coding for a desired natural protein 
is isolated, replicated (cloned) and introduced into a 
foreign host such as a bacterium, a yeast (or other 
fungi) or a mammalian cell line in culture, with ap- 
propriate regulatory control signals. When the sig- 
nals are activated, the gene is transcribed and trans- 
lated, and expresses the desired protein. In this 
manner, such useful biologically active materials as 
hormones, enzymes or antibodies have been cloned and 
expressed in foreign hosts. 
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One of the problems with this approach is that it 
is limited by the "one gene, one polypeptide chain- 
principle of molecular biology. In other words, a 
genetic sequence codes for a single polypeptide chain. 
Many biologically active polypeptides, however, are 
aggregates of two or more chains. For example, anti- 
bodies are three-dimensional aggregates of two heavy 
and two light chains. In the same manner, large en- 
zymes, such as aspartate transcarbamylase, for example, 
are aggregates of six catalytic and six regulatory 
chains, these chains being different. In order to 
produce such complex materials by recombinant DNA 
technology in foreign hosts, it becomes necessary to 
clone and express a gene coding for each one of the 
different kinds of polypeptide chains- These genes 
can be expressed in separate hosts. The resulting 
polypeptide chains from each host would then have to 
be reaggregated and allowed to refold together in so- 
lution. Alternatively, the two or more genes coding 
for the two or more polypeptide chains of the aggre- 
gate could be expressed in the same host simultaneous- 
ly, so that retolding and reassociation into the na- 
tive structure with biological activity will occur 
after expression. The approach, however, necessitates 
expression of multiple genes, and as. indicated, in 
some cases, in multiple and different hosts. These 
approaches have proved to be inefficient- 

Even if the two or more genes are expressed in the 
same organism it is quite difficult to get them all 
expressed in the required amounts. 

A classical example of multigene expression to 
form multimeric polypeptides is the expression by re- 
combinant DNA technology of antibodies. Genes for 
heavy and light chains have been introduced into ap- 
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propriate hosts and expressed, followed by reaggrega- " 
tion of these individual chains into functional anti- 
body molecules (see for example Munro, Nature, 312:5 9 7 
(1984); Morrison, S.L. Science 229 ;12Q2 (1985); Oi et 
al. , BioTechniques 4:214 (1986)); Wood et al , , Nature , 
314: 446-449 (1985)), 

Antibody molecules have two generally re'^ognized 
regions, in each of the heavy and light chains. These 
regions are the so-called "variable" region which is 
responsible for binding to the specific antigen in 
question, and the so-called "constant" region which is 
responsible for biological effector responses such as 
complement binding, etc. The constant regions are not 
necessary for c^ntigen -binding. The constant regions 
have been separated from the antibody molecule, and 
biologically active (i.e. binding) variable regions 
have been obtained. 

The variable regions of an antibody are composed 
of a light chain and a heavy chain. Light and heavy 
chain variable regions have been cloned and expressed 
in foreign hosts, and "maintain their binding ability 
(Moore et al , European Patent Publication 0088994 
(published September 21, 1983))- 

Further, it is by now well established that all 
antibodies- of a certain class and their Fab fragments 
whose structures have been determined by X-ray crys- 
tallography, even when from different species, show 
closely similar variable regions despite large differ- 
ences in the hypervariable segments. The immunoglo- 
bulin variable region seems to be tolerant toward 
mutations in the combining loops. Therefore, other 
than in the hypervariable regions^ most of the so 
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called -variable" regions of antibodies, which are 
defined by both heavy and light chains, are in fact 
quite constant in their three dimensional arrangement, 
see, for example. Huber, R. , -Structural Basis for 
Antigen-Antibody Recognition," Science , 233:702-703 
(1986). 

It would be very efficient if one could produce 
sing^ - polypeptide-chain molecules which have the same 
biological activity as the multiple chain, aggregates 
such as, for example, multiple chain antibody aggre- 
gates or enzyme aggregates. Given the "one gene-one- 
polypeptide chain- principle, such single chain mole- 
cules would be more readily produceable, and would not 
necessitate multiple hosts or multiple genes in the 
cloning and expression. In order to accomplish this, 
it is first necessary to devise a method for generat- 
ing single chain structures " from two-chain aggregate 
structures, wherein the single chain will retain the 
three-dimensional, folding, of the separate natural ag- 
gregate of two polypeptide chains. 

While the art has discussed the study of proteins 
in three dimensions, and has suggested modifying their 
architecture (see, for example, the article "Protein 
Architecture: Designing . from the Ground Op," by Van 
Brunt, J,, BioTechnoloqy , 4: 277-283 (April, 1986)), 
the problem of generating single chain structures from 
multiple chain structures, wherein the single chain 
structure will retain the three-dimensional architec- 
ture of the multiple chain aggregate, has not - been 
satisfactorily addressed. 

Given that methods for the craparation . of genetic 
sequences, their replication, their linking to expres- 



sion control regions, formation of vectors therewith 
and transformation of appropriate hosts are well un- 
derstood techniques, it would indeed be greatly ad- 
vantageous to be able to produce, by genetic engine- 
ering, single polypeptide chain binding proteins hav- 
ing the characteristics and binding ability of raulti 
chain variable regions of antibody molecules. 

SDMMARY OF THE IliVBOTION 

The present invention starts with a computer based 
system and method to determine chemical structures for 
converting two naturally aggregated but chemically 
separated light and heavy polypeptide chains from an 
antibody variable region into a single polypeptide 
chain which will fold into a three dimensional struc- 
ture very similar to the original structure made of 
the two polypeptide chains. 

The single polypeptide chain obtained from this 
method can then be used to prepare a genetic sequence 
coding therefor. The genetic sequence can then be 
replicated in appropriate hosts, further linked to 
control regions, and transformed into expression 
hosts, wherein it can be expressed. The resulting 
single polypeptide chain binding protein, upon refold- 
ing, has the binding characteristics of the aggregate 
of the original two (heavy and light) polypeptide 
chains of the variable region of the antibody. 

The invention therefore comprises: 

A single polypeptide chain binding molecule which 
has binding specificity substantially similar to the 
binding specificity of the light and heavy chain ag- 
gregate variable region of an antibody. 
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The invention also comprises genetic sequences 
coding for the above mentioned single polypeptide 
chain/ cloning and expression vectors containing such 
genetic sequences, hosts transformed with such vec- 
tors, and methods of production of such polypeptides 
by expression of the underlying genetic sequences in 
such hosts - 

The invention also extends to uses for the binding 
proteins, including uses in diagnostics, therapy, in 
vivo and in vitro imaging, purifications, and biosen- 
sors.' The invention also extends to the single chain 
binding molecules in immobilized form, or in detect- 
ably labelled forms for utilization in the above men- 
tioned diagnostic, imaging, purification or biosensor 
applications. It also extends to conjugates of the 
single polypeptide chain binding molecules with thera- 
puetic agents such as drugs or specific toxins ,.5 for 
delivery to a specific site in an animal, such as a 
human patient. 

Essentially all of the uses that the prior art has 
envisioned for monoclonal or polyclonal antibodies, or 
for variable region fragments thereof, can be con- 
sidered for. the molecules of the present invention. 

The advantages of single chain over conventional 
antibodies are smaller size, greater stability and 
significantly reduced cost- The smaller size of sin- 
gle chain antibodies may reduce the body's immunologic 
reaction and thus increase the safety and efficacy of 
therapeutic applications. Conversely, the single 
chain antibodies could be engineered to be highly an- 
tigenic. The increased stability and lower cost per- 
mits greater use in biosensors and protein purifica- 
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tion systems. Because it is a smaller and simpler 
protein, the single chain antibody is easier to fur- 
ther modify by protein engineering so .as to improve 
both its binding affinity and its specificity. Im- 
proved affinity will increase the sensitivity of diag- 
nosis and detection and detection systems while im- 
proved specificity will reduce the number of false 
positives observed. 

BRIKF DBSCRIPTION OP THE DRAWIHGS 

The present invention as defined in the claims can 
be better understood with reference to the text and to 
the following drawings, as follows: 

Figure 1 is a block diagram of the hardware as- 
pects of the serial processor mode of the present in- 
vention. 

Figure 2 is a block diagram of an alternate embod- 
iment of the hardware aspects of the present inven- 
tion. 

Figure 3 is a block diagram of the three general 
steps of the present invention. 

Figure 4 is a block diagram of the steps in the 
site selection step in the single linker embodiment. 

Figure 5A is a schematic two dimensional simplifi- 
ed representation of the light chain L and heavy chain 

H of two naturally aggregated antibody variable region 
F 

V polypeptide chains used to illustrate the site sel- 
ection process. 

Figure 5B is a two dimensional representation of 
the three dimensional relationship of the two aggre- 
gated polypeptide chains showing the light chain L 

( ) and the heavy chain H (-) of the variable 

region of one antibody. 
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Figure 6A is a simplified two dimensional sche- 
matic diagram of the two polypeptide chains showing 
the location of the residue Taa 1 and the residue Sig- 

ma I. . 

Figure 6B is a two dimensional representation of 
the actual relationship of the two polypeptide chains 
showing the residue Tau 1 and the residue Sigma 1. 

Figure 7 shows in very simplified schematic way 
the concept of the direction linkers that are possible 
between the various possible sites on the light chain 
L and the heavy chain H in the residue Tau 1 and resi- 
due Sigma 1 respectively- 

Figure 8A is a two dimensional simplified sche- 
matic diagram of a single chain anti.ody ^^J^^,"^^- 
gether two separate chains ( (^^!I£> and (_!_)) by 

linker 1 ( ) to produce a single chain antibody. 

Figure 8B is a two ^mensional representation 
showing a single chain antibody produced by Unking 
two aggregated polypeptide chains using linker 1. 

Figure 9 shows a block diagram of candidate selec- 
tion for correct span. 

Figure 10 shows a block diagram of candidate sel- 
ection for correct direction from N terminal toC ter- 
minal. 

Figure 11 shows a comparison of direction of a gap 
to direction of a candidate. 

Figure 12 shows a block diagram of candidate sel- 
ection for correct orientation at both ends. 

Figure 13 shows a block diagram of selection of 
sites for the two-linker embodiment. 

Figure 14 shows examples of rules by which candi- 
dates ma';" be ranked. 
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Fig-ure ISA shows a two-dimensional simplified re- 
presentation of the variable domain of an Fv light 
chain r L, and the variable domain of an Fv heavy 
chaiHr H, showing the first two sites to be linked* 

Figure 15B shows a two-dimensional representation 
of the three-dimensional relationships between the 
variable domain of an Fv light chain, L, and the vari- 
able domain of an Pv heavy chain, H, showing the re- 
gions in which the second sites to be linked can be 
found and the linker between the first pair of sites. 

Figure 16A shows the two-dimensional simplified 
representation of the variable domain of an Fv light 
chain, L, and the variable domain of an Fv heavy 
chain, H, showing the regions in which the second 
sites to be linked can be found and the linker between 
the first pair of sites. 

Figure 16B shows the two-dimensional representa- 
tion of the €hree-dimensional relationships between 
the variable domain of an Fv light chain, L, and the 
variable domain of an Fv heavy chain, H, showing the 
regions in which the second sites to be linked can be 
found and the linker, between the first pair of sites. 

Figure 17A shows the two-dimensional simplified 
representation of the variable domain of an Fv light 
chain, L, and the variable domain of an Fv heavy 
chain," H, showing the second linker and the portions 
of the native protein which are lost. 

Figure 17B shows the two-dimensional representa- 
tion of the three-dimensional relationships between 
the variable domain of an Fv light chain, L, and the 
variable domain of an Fv hea^r/ chain, H, showing the 
second linker and the portions of native protein which 
are lost. 
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Figure 18 shows the two-dimensional . simplified 
representation of the variable domain of an Fv light 
chain, L, and the variable domain of an Fv heavy 
chain, H, showing the complete construction. 

Figure 19 shows a block diagram of the parallel 
processing mode of the present invention. 

Figure 20A shows five pieces of molecular struc 
ture. The uppemost segment consists -of two peptides 
joined by a long line. The separation ^-^-en the 
peptides is 12.7 A. The first C of each peptide 
lies on the X-axis. The two dots indicate the stan- 
dard reference point in each peptide- , ^ , , 
Below the gap are four linker candidates Clabeled 
1 2,3 6 4), represented by a line joining the alpha 
carbons. In all cases, the first and penultimate al- 
pha carbons are on lines parallel to the X-axis, 
spaced 8.0 A apart. Note that the space between dots 
in" linker 1 is much shorter than in the gap. 

rigure 20B shows the initial peptides of linkers 
2 3, and 4. which have been aligned with the first 
oLotide of the gap. For clarity, the linkers have 
'been translated vertically to their original posi- 



tions- 



5 - 

The vector from the first peptide in the gap to 
the second oeptide in the gap lies along the X-axis, a 
corresponding vector for linkers 3 and 4 also lies 
along the X-axis. Linker 2, however, has this vector 
pointina uo and to the right, thus linker 2 is re^ec- 



tad 



figure 20C shows the tan atoms which compose rhe 
..^tial and final peptides of linkers 3 and 4, which 
hav» beon least-squares fit to the corresponding atoms 
from the gao. These peptides have been drawn m. 
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Note that in the gap and in linker 4 the final peptide 
points down and lies more-or-less in the plane of the 
paper. In linker 3, however, this final pep- 
tide points down and to the left and is twisted about 
90 degrees so that the carbonyl oxygen points toward 
the viewer. Thus linker 3 is rejected. 

Sections B and C are stereo diagrams which may be 
viewed with the standard stereo viewer provided. 

Figure 21 shows the nucleotide sequence and trans- 
lation of the sequence for the heavy chain of a mouse 
anti bovine growth hormone (BGH) monoclonal antibody. 

Figure 22 shows the nucleotide sequence and trans- 
lation of the sequence for the light chain of the same 
monoclonal antibody as that shown in Figure 21. 

Figure 23 is a plasmid. restriction map contain- 
ing the variable heavy chain sequence (pGX3772) and 
that containing the variable light sequence (pGX3773) 
shown in figures 21 and 22. 

Figure 24 shows construction TRY40 comprising the 
nucleotide sequence and its translation sequence of a 
single polypeptide chain binding protein prepared ac- 
cording to the methods of the invention. 

Figure 25 shows a restriction map of the expres- 
sion vector pGX3 776 carrying a single chain binding 
protein, the sequence of which is shown in Figure 24 • 
In this and subsequent plasmid maps (Figures 27 and 
29) the hashed bar represents the promoter O^/P^^ se- 
quence and . the solid bar represents heavy chain vari- 
able region sequences. 

Figure 26 shows the sequences of another 
single chain binding protein of the invention. 
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Pigure 27 shows expression plasmid pGX4904 carry- 
ing the genetic sequence shown in Figure 26, 

Figure 28 shows the sequences of TRY59, another 
single chain binding protein of the invention. 

Figure 29 shows the expression plasmid pGX 4908 
carrying the genetic sequence shown in Figure 28. 

Figures 30A, 30B, . 30C, and 30D (stereo) are ex- 
plained in detail in Example 1. They show the design 
and construction of double linked single chain anti- 
body TRY40. 

Figures 31A and 31B (stereo) are explained m de- 
tail in Example 2. They show the design and construe- 
tion of single linked single chain antibody TRY61. 

Figures 32A and 32B (stereo) are explained in de- 
tail ia Example 3. They show the design and construc- 
tion of single linked single chain antibody TRY59. 

Figure 33 is explained in Example 4 and shows the 

sequence of TRYl04b. 

Figure 34 shows a restriction map of the expres- 
sion vector PGX4910 carrying a single linker construc- 
tion, the sequence of which is shown in Figure 33. 

Figure 35 shows the assay results for BGH binding 
activity wherein strip one represents TRY61 and strip 
two represents TRY40. 

Figure 36 is explained ia Example 4 and shows the 
results of competing the portion of 3C2 monoclonal 
with TRY5 9 protein. 

DBTArLZD DESCRIPTION QT ' THg gRSFSRRSD S<K3DIMESTS 

TABLZ OF COHTSHTS 



I. General Ovex-7ie^r 
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II- Hardware and Software EnvirorBnen-t 

III. Single Linker Embodiment 

A. Plausible Site Selection 

B. Selection of Candidates 

1. Selecting Camdidates with Proper 
Distance Between the H Terminal and 
the C Terminals 

2. Selecting Candidates with Proper 
Direction From the H Terminal and 
the C Terminal. 

3. Selecting Candidates With Proper 
Orientation between the Termini. 

C« RanJcing auid Eliminating Candidates 

IV. Double and Mxiltiple Linker Embodiments 

A. Plausible Site Selection 

B. Candidate Selection and Candidate Rejec- 
tion Steps 

V. Parallel Processing anbodlment 

VI. Preparation and Expression of Genetic 
Sequences and Uses 

I. General Overview 

The present invention starts with a computer based 
system and method for determining and displaying pos- 
sible chemical structures (linkers) for converting two 
naturally aggrecatad but chemically separate heavy and 
light (H and L) polypeptide chains from the variable 
region of a given antibody into a single polypeptide 
chain which will fold into a three dimensional struc- 
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ture very similar to the original structure made of 
two polypeptide chains. The original structure is 
referred to hereafter as "native protein." 

The first general step of the three general design 
steps of the present invention involves selection of 
plausible sites to be linked. In the case of a single 
linker, criteria are utilized to select a plausible 
site on each of the two polypeptide chains (H and L in 
the variable region) which will result in 1) a minimum 
loss of residues from the native protein chains and 2) 
a linker of minimum number of amino acids consistent 
with the need for stability. ^ pair of sites defines 
a gap to be bridged or linked. 

A two-or-more-linker approach is adopted when a 
single linker, can. not achieve the two stated goals. 
Za both the single-linker case and the two-or-more- 
linker case, more than one gap may be selected for use 
in the second general step. 

The second general step of the present invention 
involves examining a- data base to determine possiole 
linkers to fill the plausible gaps selected in the 
first general step, so that candidates can be enrolled, 
for the third general step. Specifically, a data base 
contains a large number of amino acid sequences for 
Which the three-dimensional structure is known. In 
the second general step, this data base is examined to 
find which amino acid sequences can bridge the gap or 
gaps to create a plausible one-polypeptide structure 
which retains most of the three dimensional features 
of the native Ci^ original aggregate) variable re- 
aion molecule. The testing of each possible Unkar 
proceeds in thr^e general substeps. The first general 
substeo utilizes the length of the possible canaidate. 
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Specif ically, the span or length (a scalar quantity) 
of the candidate is compared to the span of each of 
the gaps. If the difference between the length of the 
candidate and the span of any one of the gaps is less 
than a selected quantity, then the present invention 
proceeds to the second general substep with respect to 
this candidate. Figure 20A shows one gap and four 
possible linkers. The first linker fails the first 
general substep because its span is quite different 
from the span of the gap. 

In the second general substep, called the direc- 
tion substep, the initial peptide of the candidate is 
aligned with the initial peptide of each gap. Speci- 
fically, a selected number of atoms in the initial 
peptide of the candidate are rotated and translated as 
a rigid body to best fit the corresponding atoms in 
the initial peptide of- each gap. The three dimension- 
al vector (called the direction of the linker) from 
the initial peptide of the candidate linker to the 
final peptide of the candidate linker is compared to 
the three dimensional vector (call the direction of 
the gap) from the initial peptide of each gap to the 
final pepti.de of the same gap. If the ends of these 
two vectors come within a preselected distance of each 
other,' the present invention proceeds to the third 
general substep of the second general step with re- 
spect to thi s candidate linker. 

Figure 20B shows one gap and three linkers. All 
the linkers have the correct span and the initial pep- 
tides have been aligned. The second linker fails the 
second general substep because its direction is quite 
different from that of the gap; the other two linkers 
are carried forward to the third general substep of 
the second general step. 
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in the third general snbstep ol the second des.gn 
t e step of the present invention, the or.entat.ons 
o te te^inal peptides of each Xinl^er are compared 
to Th! orientations of the terminal peptides o each 
"p specifically, a selected n»ber of -t°-.<^/ ^ 
?; 5 in the prefered embodiment) from the rnitial 
!lde of^e candidate plus the same selected number 
peptrde = p,,£ered embodiment. 

from^r f'^alV^:- - - candidate are t^en^as a 
rigid body. The corresponding atoms from one of the 

<vi/s from the initial P^P^^^^ ^ . f " 

final ^Ptide> are ta.en as a — ^ "^^^ 
.hese two >i,id bodies 

Pi-f- Tf the error tor tnxsa ^^^^ 
:rs riU::- J:., then - candidate. sses the thlrd 
:eneral substep of the second genera s ep an r, eh_ 
rolled for the third general step of P 

■ vention. Xf the error is -^^^ tU'^e: 
oreselaoted value, the next gap is tested, 
gi;: have been tested without finding a sufficiently 
good fit, the candidate is 

The third general step of the 

...nits in the ranging of ^^^^^^^^^JZ 
most plausible to least plausible. """^ 
candidate is the fragment that can bridge the t o 
olausible Sites of one of the gaps to ^ 
■polypeptide chain, where the bridge will least distort 
L resulting three dimensional folding of tne hgle 
.olyoeotide Chain from the natural folding or .he ag 
■greglt'e of the two originally chemically secara.- 

""^'Z: this third general steP of the ^^^^^^^ ^ 
tion, an expert operator uses an ^'-^"^^^^^^ 
graphics aooroach to rank the linker candida.es 
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most plausible to least plausible. This ranking is 
done by observing the interactions between the linker 
candidate with all retained portions of the native 
protein. A set of rules are used for the ranking. 
These expert system rules can be built into the system 
so that the linkers are displayed only after they have 
satisfied the expert system rules that are utilized. 

The present invention can be programmed so that 
certain expert rules are utilized as a first general 
substep in the third general step to rank candidates 
and even eliminate unsuitable candidates before visual 
inspection by an expert operator, which would be the 
second general substep of the third general step. 
These expert rules assist the expert operator in rank- 
ing the candidates from most plausible to least plaus- 
ible. These expert rules can be modified based on 
experimental data on linkers produced by the system 
and methods of the present invention. ^ 

The most plausible candidate is a genetically pro- 
ducible single polypeptide chain binding molecule 
which has a very significantly higher probability (a 
million or more as compared to a random selection) of 
folding into a three dimensional structure very simi- 
lar to the original structure made of the heavy and 
light chains of the antibody variable region than 
would be produced if random selection of the linker 
was done. In this way, the computer based system and 
method of the present invention can be utilized to 
engineer single polypeptide chains by using one or 
more linkers which convert naturally aggregated but 
chemically separated polypeptide chains into the de- 
"'s-fred single chain. 

The elected candidate offer-s to the user a linked 
chain structure having a very significantly increased 
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£ t-vian would be obtained 

probability of proper folding than wouia 

^ • -*-^^*=vec: This means tnat ^ne 
using a random selection process, mis 
TeneL engineering aspect, of oreatln, the desired 
rm^le polypeptide chain Is significantly reduced, 
slnfe the nler of candidates that have to be gene- 
tically engineered in practice is reduced by a corres- 
iSng slant. The .ost plausible candidate can be 
used to genetically engineer an actual molecule. 

^e parameters of the various candidates can be 
stored for later use. They can also be provided to 
the user either visually or recorded on a suita 
media Cpaper, magnetic tape, color slides, etc.). The 
results of the various steps utilised in the design 
process can also be stored for later use or examina- 

''°";he design steps of the present invention operate 
on a conventional minicomputer system having storage 
devices dapable of storiSg the amino acid sequence 

. base the various application programs 

structure data base, tne v r ... ,i„j,.r 

utilized and the parameters of the possible linger 
candidates that are being evaluated. 

The minicomputer CD is connected by a suitable 
serial processor structure to an interactive computer- 
graphics display system. Typically, the interactiv 
computer-graphics display system comprises a display 
terminal with resident three-dimensional aPPi-- 
software and associated input and output devices such 
as X/Y plotters, position control devices Cpotentio- 
meters, an x-y tablet, or a mouse), and :«yboard. 

Thl interactive computer-graphics display syste, 
allows the expert operator to view the chemical struc- 



tures being 



evaluated in the design process 
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present invention. Graphics and. programs are used to 
select the gaps (Gen. Step 1), and to rank candidates 
(Gen, Step 3). Essentially, it operates in the same 
fashion for the single linker embodiment and for the 
two or more linker embodiments* 

For example, during the first general step of the 
present invention, the computer-graphics interactive 
display system allows the expert operator to visually 
display the two naturally aggregated but chemically 
separate polypeptide chains. Using three dimensional 
software resident in the computer-graphics display 
system, the visual representation of the two separate 
polypeptide chains can be manipulated as desired. For 
example, the portion of the chain(s) being viewed can 
be magnified electronically, and such magnification 
can be performed in a zoom mode. Conversely, the im- 
age can be reduced in size, and this reduction can 
also be done in a reverse zoom mode'. The posi-tion of 
the portion of the molecule can be translated, and the 
displayed molecule can be rotated about any one of the 
three axes (x, y and z). Specific atoms in the chain 
can be selected with an electronic pointer. Selected 
atoms can be labeled with appropriate text. Specific 
portions of native protein or linker can be identified 
with color or text or brightness. Unwanted portions 
of the chain can be erased from the image being dis- 
played so as to provide the expert operator with a 
visual image that represents only a selected aspect of 
the chain(s). Atoms selected by pointing or by name 
can be placed ^t the center of the three dimensional 
display; subsequent rotation uses the selected atom as 
the origixn. These and other display aspects provide 
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the expert operator with the ability to visually re- 
present portions of the chains which increase the 
ability to perform the structural design process 

one of the modes of the present invention utilizes 
a serial computational architecture. This architec- 
ture using the present equipment requires approximate- 
ly four to six hours of machine and operator time in 
order to go through the various operations r^uired 
for the three general steps for a particular selection 
of gaps. Obviously, it would be desirable to signifi- 
cantly reduce the time since a considerable portion 
thereof is the time it takes for the computer system 
to perform the necessary computational steps. 

An alternate embodiment of the present invention 
utilizes a parallel processing architecture. This 
parallel processing architecture significantly reduces 
the time required to perform the necessary computa- 
tio^l steps. A hypercube. of a large number of nodes 
can be utilized so that the various linkers that are 
possible for the selected sites can 'ce rapidly pre- 
sented to the expert system operator for evaluation. 

Since there are between 200 and 300 known protein 
structures, the parallel processing approach . can be 
utilized. There currently are computers commercially 
available that have as many as 1,024 computing nodes 

using a parallel processing approach, the data 
base of observed peptide structures can be divided 
into as many parts as there are computing nodes. For 
example, if there are structures for 195 proteins with 
.19 'amino acids each, one would have struc-ires for 
195x218 dioeotides, 195x217 tripeotides, 195x21. tet- 
raoeotides', etc. One can extract all peptides up to 
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some length n. For example r if n were 30, one .would 
have 195x30x204 peptides- Of course, proteins vary in 
length, but with 100 to 400 proteins of average length 
200 (for example), and for peptide linkers up to 
length 30 amino acids (or any other reasonable num- 
ber), one will have between 1,000,000 and 4,000,000 
peptide structures. Once the peptides have been ex- 
tracted and labeled with the protein from which they 
came, one is free to divide all the peptides as evenly 
as possible among the available computing nodes. 

The parallel processing mode operates as follows. 
The data base of known peptides is divided among the 
available nodes. Each gap is sent to all the nodes. 
Each node takes the gap and tests it against those 
peptides which have been assigned to it and returns 
information about any peptides which fit the gap and 
therefore are candidate linkers. As the testing for 
matches between peptides and gaps proceeds indepen- 
dently in each node, the searching, will go faster by a 
factor equal to the number of nodes. 

A first embodiment of the present invention uti- 
lizes a single linker to convert the naturally aggre- 
gated but chemically separate heavy and light chains 
into a single polypeptide chain which will fold into a 
three dimensional structure very similar to the orig- 
inal structure made of two polypeptide chains. 

A second embodiment utilizes two or more linkers 
to convert the two heavy and light chains into the 
desired single polypeptide chain. The steps involved 
in each of these embodiments utilizing the present 
invention ara illustratad in the e:cplanaticn below. 
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Once the correct amino acid sequence for a sxngle 
chain binding protein has been defined by the computer 
assisted methodology, it is possible, by methods well 
icnown to those with skill in the art, to prepare an 
underlying genetic sequence coding therefor. • 

in preparing this genetic sequence, it is possible 
to utilize synthetic DNA by synthesiring the entire 
sequence de novo. Alternatively, it is possible to 
Obtain CDKA sequences coding for certain preserved 
portions of the light and heavy chains of the desired 
antibody, and splice them together by means of the 
necessary sequence coding for the peptide linker, 

described. t4-.-«^ 
Also by methods known in the art, the resulting 
sequence can be amplified by utilizing well known 
cloning vectors and well known hosts. Furthermore 
the amplified sequence, after checking for correct- 
ness, can be linked to promoter and terminator sig- 
nals, inserted into, appropriate expression vectors, 
and transformed into hosts such as procaryotic or eu 
caryotic hosts. Bacteria, yeasts Cor other fungi) or 
mammalian cells can be utilized. Upon expression, 
either by itseH" or as part of fusion polypeptides, as 
will otherwise be known to those of skill in the art, 
the single chain binding protein is allowed to refold 
in physiological solution, at appropriate conditions 
of pH, ionic strength, temperature, and redox poten- 
tial, and purified by standard separation procedures. 
These would include chromatography in its various dif- 
fo^=at tvpes, known to those with skill in the art. 

The \hus obtained purified single chain bmoing 
~ protein can be utilized by itself.^ in detectably la- 
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belled form, in immobilized form, or conjugated to 
drugs or other appropriate therapeutic agents, in 
diagnostic, imaging, biosensors, purifications, and 
therapeutic uses and compositions. Essentially all 
uses envisioned for antibodies or for variable region 
fragments thereof can be considered for the molecules 
of the present invention, 

II. Hardware and Software Environment 

A block diagram of the hardware 'aspects of the 
present invention is found in Figure 1- A central pro- 
cessing unit (CPU) 102 is connected to a first bus 
(designated massbus 104) and to a second bus (desig- 
nated Unibus 106). A suitable form for CPU 102 is a 
model Vax 11/780 made by Digital Equipment Corporation 
of Maynard, Massachusetts, Any suitable type of CPU, 
however, can be used. 

Bus 104 connects CPU 102 to a plurality of storage 
devices. In the best mode, these storage devices in- 
clude a tape drive unit 106. The tape drive unit 106 
can be used, for example, to load into the system the 
data base of the amino acid sequences whose three 
dimensional structures are known. A suitable form for 
tape drive 106 is a Digital Equipment Corporation mod- 
el TU 78 drive, which operates at 125 inches per sec- 
ond, and has a 1600-6250 bit per inch (BPI) dual capa- 
bility. Any suitable type of tape drive can be used, 
however. 

Another storage device is a pair of hard disk 
units labeled generally by reference numeral 108, A 
suitable form for dis.'c drive 108 comprises two Digital 
Equipment Corporation RmOS disk drives having, for 
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example, 256 Mbytes of storage per disk. Another disk 
drive system is also provided in the serial processor 
mode and is labeled by reference numeral 110. This 
disk drive system is also connected to CPU 102 by bus 
104, A suitable form for the disk system 110 compris- 
es three Digital Equipment Corporation model Ra 81 
hard disk drives having, for example, 450 Mbytes of 

storage per disk- 
Dynamic random access memory is also provided by a 
memory stage 112 also conne'cted to CPO 102 by bus 104. 
Anv suitable type of dynamic memory storage device can 
be" used. In the serial processor mode, the memory is 
made up of a plurality of semi- conductor storage de- 
vices found in a DEC model Ecc memory unit. Any suit- 
able type of dynamic memory can be employed. 

''The disk drives 108 and 110 store • several differ- 
ent blocks of information. , For example, they store 
the data base containing the -amino acid sequences and 
structures that are read in by the tape- drive 106. 
They also ' store the application software package re- 
quired to search the data base in accordance with the 
procedures of the present invention. They also store 
the documentation and executables of the software. 
The hypothetical ' molecules that are produced and 
structurally, examined by the present invention are 
represented in the same format used to represent the 
orotein structures in the data base. Using this for 
'mat, these hypothetical molecoles are also stored by 
the disk drives 108 and 110 for use during the struc- 
tural design process and for subsequent use after the 
orocess has been ccmpieted. 
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A Digital Equipment Corporation VAX/VMS DEC oper- 
ating system allows for multiple users and assures 
file system integrity. It provides virtual memory, 
which relieves the programer of having to worry about 
the amount of memory that is used. Initial software 
was developed under versions 3,0 to 3.2 of the VAX/VMS 
operating system. The serial processor mode currently 
is running on version 4.4. DEC editors and FORTRAN 
compiler were utilized. 

The CPU 102 is connected by Bus 106 to a multi- 
plexer 114. The multiplexer allows a plurality of 
devices to be connected to the CPU 102 via Bus 106- ' A 
suitable form for multiplexer 114 is a Digital Equip- 
ment Corporation model Dz 16 terminal multiplexer. In 
the preferred embodiment, two of these multiplexers 
are used. The multiplexer 114 supports terminals (not 
shown in Figure* 1) and the serial communications (at 
19.2 Kbaud, for example) to the computer-graphics dis- 
play, system indicated by the dash lined box 116. 

The computer-graphics display system 116 includes 
an electronics stage 118. The electronic stage 118 is 
used for receiving the visual image prepared by CPU 
102 and for displaying it to the user on a display 
(typically one involving color) 120. The electronic 
stage 118 in connection with the associated subsystems 
of the computer-graphics display system 116 provide 
for local control of specific functions, as described 
below, A suitable form of the electronics • system 118 
is a model PS 320 made by Evans & Sutherland Corp. of 
Salt Lake, Utah. A suitable form for the display 120 
is either a 25 inch color monitor or a 19 inch color 
monitor from Evans & Sutherland, 
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Dynamic random access memory 122 is connected to 
the electronic stage 118- Memory 122 allows the elec- 
tronic system 118 to provide the local control of the 
image discussed below. In addition, a keyboard 124 of 
conventional design is connected to the electronic 
stage 118, as is an x/y tablet 126 and a plurality of 
dials 128. The keyboard 124, x/y tablet 126, and 
dials 128 in the serial processor mode are also ob- 
tained from Evans & Sutherland- 

The computer generated graphics system 116, as 
discussed above, receives from CPU 102 the image to be 
displayed. It provides local control over the dis- 
played image so that specific desired user initiated 
functions can be performed, such as: 

(1) zoom (so as. to increase or decrease the size 
of the image being displayed'; 

(2) clipping (where the sides, front as back of 
the image being displayed are removed); 

(3) intensity depth queing (where objects further 
away from the viewer are made dimmer so as to provide 
a desired depth effect in the image being displayed); 

(4) translation of the image in any of the three 
axes of the coordinate system utilized to plot the 
molecules being displayed; 

(5) rotation in any of the three directions of 
the image being displayed; 

(6) on/off control of the logical segments of the 
picture. For example, a line connecting the alpha 
carbons of the native protein might be one logical 
segment; labels on some or all of the residues of the 
native protein night be a second logical segment; a 
trace of the alpha carbons of the linker (s) might be a 



wo 88/01649 



PCT/US87/02208 



-27- 

third segment; and a stick figure connecting Carbon, 
Nitrogen, Oxygen, and Sulphur atoms of the linker (s) 
and adjacent residue of the native protein might be. a 
fourth logical segment. The user seldom wants to see 
all of these at once; rather the operator first be- 
comes oriented by viewing the first two segments at 
low magnification. Then the labels are switched off 
and the linker carbon trace is turned on. Once the 
general features of the linker are seen, the operator 
zooms to higher magnification and turns on the seg- 
ments which hold more detail; 

(7) selection of atoms in the most detailed logi- 
cal segment. E)espite the power of modern graphics, 
the operator can be overwhelmed by too much detail at 
once. Thus the operator will pick one atom and ask to 
see all amino acids within some radius of that atom, 
typically 6 Angstroms, but other^ radii can be used. 
The user may also specify that certain ammo acids 
will be included in addition to those that fall within 
the specified radius of the selected atom; 

(8) changing of the colors of various portions of 
the image being displayed so as to indicate to the 
viewer particular information using visual queing. 

As stated above, the serial processor mode of the 
present invention currently is running the application 
software on version 4.4 of the Vax/Vms operating sys- 
tem used in conjunction with CPU 102. The applica- 
tion programs were programmed using the FLECS (FORTRAN 
Language with Extended Control Sections) programming 
language written in 1974 by Terry Beyer of the Univer- 
sity cf Oregon, Eugene, Oregon. FLECS is a FORTRAN 
preprocessor, which allows more logical programming. 
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All of the code used in the serial processor mode was 
developed in FLECS. It can be appreciated, however, 
that the present invention encompasses other operating 
systems and programming languages. • 

The macromolecules displayed on color display 120 
of the computer-graphics display system 116 utilize an 
extensively modified version of version 5.6 of FRODO. 
FRODO is a program for displaying and manipulating 
macromolecules. FRODO was written by T.A. Jones at 
Max Planck Institute for Biochemistry, Munich, West 
Germany, for building or modeling in protein crystal- 
lography. FRODO version 5.6 was modified so as to be 
driven by command files; programs were then written to 
create the command files. It is utilized by the elec- 
tronic stage 118 to display and manipulate images on 
the -color display 120. , Again, any suitable type of 
program can be used for displaying and manipulating, 
the macromolecules, thi coordinates of which are pro- 
vided to the computer-graphics display system 116 by 
the C?a 102. 

Design documentation and memos were written using 
PDL (Program Design Language] from Caine, Farber & 
Gordon of PasaJena, California. Again, any suitable 
type of program can be used for the design documents 
and memos. 

Figure 2 shows a block diagram for an improved 
version of the hardware system of the present inven- 
tion. Like numbers refer to like items of Figure 1. 
Only the differences between the serial processor mode 
system of Figiors 1 and the improved system of Figure 2 
are discussed below. 
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The CPU 102' is the latest version of the Vax 
11/780 from Digital Equipment Corporation. The latest 
processor from DEC in the VAX product family is ap- 
proximately ten times faster than the version shown in 
the serial processor mode of Figure 1. 

Instead of the two Rm05 disk drives 108 of Figure 
1, the embodiment of Figure 2 utilizes five RA 81 disk 
drive units 110'. This is to upgrade the present sys- 
tem to more state of the art disk drive units, which 
provide greater storage capability and faster access. 

Serial processor 106 is connected directly to the 
electronic stage 118* of the computer-graphics display 
system 116, The parallel interface in the embodiment 
of Figure 2 replaces the serial interface approach of 
the serial processor mode of Figure 1- This allows 
for faster interaction between CPU 10 2' and electronic 
stage 118'- so as to provide faster data, display to the 
expert operator. 

Disposed in front of color display 120 is a stereo 
viewer 202, A suitable form for stereo viewer 202 is 
made by Terabit, Salt Lake City, Utah. Stereo viewer 
202 would provide better 3-D perception to the expert 
operator than can be obtained presently through rota- 
tion of the molecule. 

In addition, this embodiment replaces the FRODO 
macromolecule display programs with a program designed 
to show a series of related hypothetical molecules. 
This newer program performs the operations more quick- 
ly so that the related hypothetical molecules can be 
presented to the expert operator in a short enough 
time that makes eicamination lass burdensome on the 
operator . 
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The programs can be modified so as to cause the 
present invention to eliminate candidates in the sec- ^ 
end general step where obvious rules have been vio- 
lated by the structures that are produced. For exam- 
ple, one rule could be that if an atom in a linker 
comes closer than one Angstrom to an atom in the na- 
tive structure the candidate would be automatically 

eliminated. ; . 

Xh addition, the surface accessibility of mole- 
cules could be determined and a score based on the 
hydrophobic residues in contact with the solvent could 
be determined. After the hydrophobic residues have 
been calculated, the candidates could be ranked so 
that undesired candidates could automatically be elim- 
inated. The protein is modeled in the present inven- 
tion without any surrounding matter. Proteins almost 
always exist in aqueous solution; indeed, protein 
5 crystals contain between 20% and 90% water and dis- 
' solved salts which .fill the space between th^ protein 
molecules. Certain kinds of amino acids have side- 
chains Which make favorable interactions with aqueous 
solutions (serine, threonine, arginine, lysine, histi- 
dine, aspartic acid, glutamic acid, proline, aspara- 
gine, and glutamine) and are termed hydrophilic. 
Other amino acids have side chains which are apolar 
and make unfavorable interactions with water (phenyla- 
lanine, tryptophan, leucine, isoleucine, valine, meth- 
ionine, and tyrosine) and are termed hydrophobic. In 
natural proteins, hydrophilic amino acids are almost 
always found on the surface, in contact with solvent; 
hydroohobic amino acids are almost always inside tne 
oroteln in contact with other hydrophobic amino acias. 
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The remaining cimino acids (alanine, glycine, and cys- 
teine) are found both inside proteins and on thsir 
surfaces. The designs of the present invention should 
resemble natural proteins as much as possible / so hy- 
drophobic residues are placed inside and hydrophilic 
residues are placed outside as much as possible. 

Programs could be utilized to calculate an energy 
for each hypothetical structure. In addition, pro- 
grams could make local adjustments to the hypothetical 
molecules to minimize the energy. Finally, molecular ' 
dynamics could be used to identify particularly un- 
stable parts of the hypothetical molecule. Although 
existing progran<s could calculate a nominal energy for 
each hypothetical structure, it has not yet been de- 
monstrated that such calculations can differentiate 
between sequences which will fold and those that will 
not. Energy minimization could also be accomplished 
with extant programs, but energy minimization also can 
not differentiate between sequences which will fold 
and those that will not. Molecular dynamics simula- 
tions currently cannot be continued long enough to 
simulate the actual folding or unfolding of a protein 
and so cannot distinguish between stable and unstable 
molecules , 

Two megabytes of storage 128' in the computer 
generated display system 116 is added so that , several 
different molecules can be stored at the display 
level. These molecules then can be switched back- and 
forth on the color display 120 so that the expert 
operator can sequentially view them while making ex- 
pert decisions. The parallel interface that is shown 
in Figure 2 would allow the coordinates to be trans- 
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ferred faster from the CPU 102' to the electronics 
stage 118' of the computer . generated display system 

116. ^- ^ 4: 

The parallel processing architecture embodiment of 

the present invention is described below in Section V. 
This parallel architecture embodiment provides even 
faster analysis and display. 

III. Single Linker Enbodiment. 

This first embodiment of the present invention, 
determines and displays possible chemical structures 
for using a single linker to convert the naturally 
aggregated but chemically separate heavy and light 
polypeptide chains into a single polypeptide chain 
which will fold into a three dimensional structure 
very similar to the original structure made of two 

polypeptide chains. 

a.. Plausible Site Selection 
There are two main goals of the plausible site 
selection step 302 of the -present invention shown in 
ve-v generalized block diagram form in Figure 3. The 
first goal is to select a first plausible site on the 
first chain that is the minimum distance from the sec-, 
ond plausible site on the second chain. The first 
point on the first chain and the second point on the 
second chain comprise the plausible site- 

The second goal of the site selection is to select 
plausible sites that will result in the least loss of 
native protein. Native protein is. the original pro- 
tain comoosed of the two aggregated polypeptide chains 
of the variable region. It is not chemically possible 
tc convert two chains to one without altering some of 



the ajnino acids. Even if only one amino acid was add- 
ed between the carboxy terminal of the first domain 
and the amino terminal of the second domain^ the char- 
ges normally present at these termini would be lost. 
In the variable regions of antibodies, the terminii of 
the H and L chains are not very close together. Hypo- 
thetical linkers which join the carboxy terminus of 
one chain to the sunino terminus of the other do not 
resemble the natural variable region structures. Al- 
though such structures are not impossible, it is more 
reasonable to cut away small parts of the native pro- 
tein so that compact linkers which resemble the native 
protein will span the gap. Many natural proteins are 
known to retain their structure when one or more resi- 
dues are removed from either end. 

In the present embodiment/ only a single linker 
.(amino acid sequence or bridge for bridging or linking 
the two plausible sites to form a single polypeptide 
chain) is used. Figure 4 shows in block diagram form 
the steps used to select plausible sites in the single 
linker. The steps of Figure 4 are a preferred embodi- 
ment of step 302 of Figure 3. 

A domain 1 is picked in a step 402 (see Figure 4). 
A schematic diagram of two naturally aggregated but 
chemically separate polypeptide chains is shown in 
Figure 5A. For purposes of illustration, assume that 
L is the light chain of the antibody variable region 
(the first polypeptide chain) and is domain 1. As 
shown in Figure 5A, light chain L is on the left side, 
and heavy chain H is on the right side. 

The next step 40 4 is to pick the domain 2, which, 
as indicated, is the hea^/y chain H of the antibcndy 
variable region on the right side of Figure 5A. 
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The linker that will be selected will go from do- 
main 1 (the light chain L) towards domain 2 (heavy 
chain, H). AS the linker will become part of the sin- 
gle polypeptide chain, it must have the same direc- 
tionality as the polypeptides it is linking; the 
amino end of the linker must join the carboxy terminal 
of some amino acid in domain 1, and the carboxy ter- 
minal - of the linker must join, the amino terminal of 
some residue in domain 2. A starting point (first 
site) on domain 1 is selected, as represented by step 
ia 40 6 in Figure 4. The starting point is chosen to 
be close to the C (C for carboxy) terminal of domain 
1, call this amino acid tau 1. It is important to 
pick tau 1 close to the C terminal to minimize loss of 
nativ;i.rotein structure. Residue tau 1 is shown 
schematically in two dimensions in figure 6A; it is 
also Shown in figure 6B where it is presented m a 
two-dimensional representation of the naturally aggre- 
gated but chemically separate H and L polypeptide 
chains. 

Next, the final point (second site) close the N (N 
for amino) terminal of domain 2 is selected, as indi- 
cated by step 408 of Figure 4. The final site is an 
amino acid of domain 2 which will be called si^^a 1- 
It is imoortant that amino acid sisma 1 be close to 
the N terminal of domain 2 to minimize loss of native 
protein structure. Amino acid siama 1 is shown sche- 
matically in figure 6A and in the more realistic re- 
oresentation of figure 63, 

Figure 7 shows in simplified form the concept that 
the linker goes from a first site at amino acid tau 1 
in domain 1 t= a second sits at amino acid sicma i m 



domain 2* There are a plurality of possible first 
sites and a plurality of second sites, as is shown in 
figure 7. A computer program prepares a table which 
contains for each amino acid in domain 1 the identity 
of the closest amino acid in domain 2 and the dis- 
tance. This program uses the position of the alpha 
carbon as the position of the entire amino acid. The 
expert operator prepares a list of plausible amino 
acids in domain 1 to be the first site, tau 1/ and a 
list of plausible amino acids in domain 2 to be the 
second site, sigma 1. Linkers are sought from all 
plausible sites tau 1 to all plausible sites sigma .1. 
The expert operator must exercise reasonable judgement 
in selecting the sites tau 1 and sigma 1 in deciding 
that certain amino acids are more important to the 

stability of the native protein than are other amino 

* 

acids. Thus the operator may select sites which are 
not actually the closest. 

The complete designed protein molecule in accor- 
dance with the present invention consists of the dom- 
ain 1 (of the light chain L) up to the amino acid tau 
1, the linker, as shown by the directional-line in 
Figure 8A and in Figure 8B, and the domain 2 from ami- 
no acid sigma 1 to the C terminus of the heavy chain, 
H. As shown in Figures 8A and 8B, in the representa- 
tive example, this results in the following loss of 
native protein. 

The first loss in native protein is from the resi- 
due after residue tau 1 to the C terminus of domain 1 
(light chain L). The second loss of native protein is 
from the N terminus of domain 2 (heavy chain, H) to 
the amino acid before sigma 1. 
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&s is be« understood from Figure 8A, the intro- 
duction of lialcar 1 produces . single P^.^^J-^"^^ 
Chain from the two naturally aggregated chains The 
polypeptide Chain begins with the H terminal of domain 
I Referring now to Figure 8B. the chain proceeds 
through almost the entire course of the native lig^t 
chain,. L. until it reaches amine acid iau 1. The 
linker then connects the =arbo.y terminal of a 'e-y 
slightly truncated domain 1 to residue siama 1 in the 
ver^ slightly truncated domain 2. Since a minimum 
amfunt of native protein is eliminated, and the inker 
is selected to fit structurally as well as possible 
(as described below in connection with general steps 2 
and 3 of the present invention) , the resulting single 
polypeptide chain has a very high probability (several 
Ldfrs of magnitude greater than if the linker «a 
selected randomly, to fold intr, a three-dimensional 
structure very similar to the original structure made 
of two polypeptide chains. _ 

The single polypeptide chain results xn a much 
n.ore stable protein which contains a binding s.te 
very. similar to the binding site of the original an- 
tibody, in this way a single polypeptide chain can be 
engineered from the naturally occuring two-polypep- 
tide chain variable region, so as to create a polypep- 
tide of only one chain, but maintaining the bxndxng 

site of the antibody. 

J. ^^A^ of th*» orasent invention, the 
In. the current mode or tne pt-ac^-K- 

u , c,= i,=^-h- the sites with minimal help 
exoert operator selects tne sx^s. 

from the comouter. The computer prepares the table o. 
closest-residue-in-other-domain. The computer can 
provide mora help in the following ways. 
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(1) Prepare a list of conserved and variable res- 
idues for variable regions of antibodies (Fv region) . 
Residues which vary from Fv to Fv would be much better 
starting or ending sites for linkage than are residues 
which are conserved over many different Fv sequences. 

(2) Prepare a list of solvent accessibilities. 
Amino acids exposed to solvent can be substituted with 
less likelihood of destabilizing the native structure 
than amino acids buried within the native 'structure. 
Exposed amino acids are better choices to start or end 
linkage. 

With respect to each of the plurality of possible 
first sites (on domain 1 or light chain L) there are 
available a plurality of second sites (on domain 2 or 
heavy chain H) (See Figures 7 and 8A), As the second 
site is selected closer to the N teirminus of domain 2, 
the distance to any of the plausible first sites in- 
creases. Also, asJthe first site is selected closer 
to the C terminus of domain 1 the distance to any of 
the plausible second sites increases. It is this ten- 
sion between shortness of linker and retention of na- 
tive protein which the expert operator resolves in 
choosing gaps to be linked. The penalty for including 
extra sites in the list of gaps are: 

(1) searching in general step 2 will be slower; 

and 

(2) more candidates will pass from step 2 many of 
which must be rejected in step 3. As step 3 is cur- 
rently a manual step, this is the more serious penal- 
ty. 

Figure 3B shows diagramatically by a directional arrow 
the possible links that can occur between the various 
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Proteias usually exist in aqueous solution. Al- 
though protein coordinates are almost always deter- 
mined for proteins in crystals, direct contacts be- 
tween proteins are quite rare. protein crystals con- 
tain from 20% to 90% water by volume. Thus one usual- 
ly assumes that the structure of the protein in solu- 
tion will be the same as that in the crystal. It is 
now generally accepted that the solution structure of 
a protein will differ from the crystal structure only 
in minor details. Thus, gi^ren the coordinates of the. 
atoms, one can calculate quite easily the solvent ac- 
cessibility of each atom. 

In addition, the coordinates implicitly give the 
charge distribution throughout the protein. This is 
of use in estimating whether a hypothetical molecule 
(made of native protein and one or more: linkers) will 
fold as designed. The typical protein whose structure 
is known comprises a chain of amino acids (there are 
21 types of amino acids) in the range of 100 to 300 
amino acids. 

Each of these amino acids alone or in combination 
with the other amino acids as found in the known pro- 
tein molecule can be used as a fragment to bridge the 
two sites. The reason that known protein molecules 
are used is to be able to use known protein fragments 
for the linker or bridge. 

Even with only 250 proteins of known structure, 
the number of possible known fragments is very large. 
A linker can b€ from one to twenty or thirty amino 
acids lone. Let "Lmax--' be the maximum number of amino 
acids allowed in a linker, for axample, Lma:c might be 
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25, Consider a protein of "Naa" amino acids. Pro- 
teins have Naa in the. range 100 to 800, 250 is typi- 
cal. From this protein one can select Naa-1 distinct 
two-amino-acid linkers, Naa-2 distinct three-amino- 
acid linkers, .and ( Naa+l-Lraax) distinct linkers con- 
taining exactly Lmax amino acids. The total number of 
linkers containing Lmax or fewer linkers is "Nlink, " 



j==l , Lmax 

= Naa x (Lmax) - (Lmax x Lmax)/2 + Lmax /2 
If Naa is 250 and Lmax is 25, Nlink will be 5975. If 
the number of known proteins is "Nprot," then the 




total number of linkers, "Nlink total" will be 





k=l,Nprot 



j=l, Lmax 




[ Naa ( k ) X ( Lmax ) - ( LmaxxLmax ) /2+Lmax/ 2 ] 



k=l, Nprot 



= Nprot:c (Lmax/2-Lmax x Lmax)/2 + Lmax x)'Naa(k) 




K=l, Nprot 
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V«.ere llaaClc) is the aumber of amino, acids in the Icth 
protein. With 250 proteins, each containing 250 ammo 
protein Nlinlc^total is 

acids (on average) , and Lmax sec , 

1,425,000. 

Ihi. is the number of linkers of known structure 
Xf one considers the nui^er o£ possible amino ac.d 
sequences up to length I-ex <call it -Hlinkjoss.- 
ble") r it is much larger. 



NlinJc__possible = ^ 20 



J = l^Lmax 



For Lmax = 25 

504 547r 368, 421, 052, 
Nlin)c_possible = 353, 204, a*/ f 

631, 578, 947, 368, 420 

= 3.53 * 10''^ 

using ^ peptide fragments thus reduces the possi- 
bilities by twenty-six orders of magnitude. Appropri 
ate searching through the known peptide .fragments re- 
duces the possibilities a further five orders or mag- 

'''''"Essentially, the present invention utilizes a se- 
lection strategy for reducing a list of possible can- 
didates. This is done as explained below in a prefer- 
red form in a three step process. This three step 
process, as is illustrated in the explanation of the 
;.ch of the three steps of the process, significantly 
.educes the computer time required to extract the mos. 
orcmising candidates from the data base of possible 
'candidates. This should be contrasted with a seria 
se-=h throughout the entire data base of candidates, 
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which would require all candidates to be examined in 
total. The present invention examines certain speci- 
fic parameters of each candidate, and uses these para- 
meters to produce subgroups of candidates that are 
then examined by using other parameters. In this way, 
the computer processing speed is significantly in- 
creased. 

The best mode of the present invention uses a pro- 
tein data base created and supplemented by the Brook- , 
haven National Laboratory in Upton, Long Island, New 
York. This data base is called the Brookhav-en Protein 
Data Base (BPDB). It provides the needed physical and 
chemical parameters that are needed by the present 
invention. It should be understood, that the candi- 
date linkers can be taken from the Brookhaven Protein 
Data Base or any other source of three-dimensional 
protein structures. These sources must accurately 
represent the proteins. In the current embodiment j- 
X-ray structures determined at resolution of 2,5A or 
higher and appropriately refined were used. Each pep- 
tide is replaced (by least- squares fit) by a standard 
planar peptide with standard bond lengths and angles. 
Peptides which do not accurately match a standard pep- 
tide ( e.g. cis peptides) are not used to begin or end 
linkers, but may appear in the middle. 

Each sequence up to some maximum number of amino 
acids (Lmax) is taken as a candidate. In the prefer- 
red embodiment, the maximum number of amino acids 
(Lmax) is set to 30. However, the present invention 
is not limited to this number, but can use any maximum 
number that is desired under the protein engineering 
circ^:ims tances involved- 
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1. Selecting Candidates w ith Proper Dia- 
^-^nr.^ Between the H Term inal and the C Terminal. 

The first step in the selection of candidates step 
is to select the candidate linkers with a proper dis- 
tance between the N terminal and the C terminal from 
all of the candidate linkers that exist in the protein 
data base that is being used. Figure 9 shows in block 
diagram form the steps that make up this candidate 
selection process utilizing distance as the selection 
parameter. 

Referring to Figure 9, a standard point relative 
to the peptide unit at the first site is selected, as 

shown by block 90 2. 

a. standard point relative to the peptide unit in 
the second site is also picked, as indicated by a 
.block 9G4. Note that in the best mode the geometric 
centers of the peptide units of the firsthand second 
sites are used, but any other standard point can be 
utilized, if desired. 

The distance between the standard points of the 
two peptides , at the first and second sites defining 
the gap to be bridged by the linker is then calculat- 
ed, as indicated by block 906. This scalar distance 
value is called the Span of the gap. Note that this 
scalar value does not include any directional informa- 
tion. 

Next, as indicated by a step 90 8, the distance 
between the ends of the possible linker candidates are 
calculated. The distance between the ends of a par- 
ticular candidate is called the span of the candidate. 
Note that each possible linker candidate has a span of 
the candidate scalar value. • 
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The final step in the distance selection candidate 
selection process is that of a step 910. In step 910, 
candidates are discarded whose span of the candidate 
values differ from the span of the gap value by more 
than a preselected amount (this preselected amount is 
Max LSQFIT error). In the best mode of the present 
invention, the preselected amount for Max LSQFIT error 
is 0,50 Angstroms, However, any other suitable value 
can be used. 

The preceding discussion has been for a single 
gap. In fact, the expert user often selects several 
gaps and the search uses all of them. The span of 
each candidate is compared to the span of each gap 
until it matches one, within the preset tolerance, or 
the list of gaps is exhausted. If the candidate mat- 
ches none of the gaps, it is discarded. If it matches 
any gap it is carried to the next stage. 

The inventors have determined that the use of the 
distance as the first parameter for discarding possi- 
ble linker candidates results in a significant reduc- 
tion in the number of possible candidates with a mini- 
mum amount of computer time that is needed. In terms 
of the amount of reduction, a representative example 
(using linkers up to 20 amino acids) starts out with 
761,905 possible candidates that are in the protein 
data base. This selection of candidates using the 
proper distance parameter winnows this number down to 
approximately 63,727 possible candidates. As is dis- 
cussed below, the distance selection operation re- 
quires much less computer time than is required by the 
other two steps which make up this selection step 304, 
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The result of this selection of candidates accord- 
ing to proper distance is a group (called a first 
group o£ candidates) which exhibit a proper length as 
compared to the gap that is to be bridged or linked. 
This first group of candidates is derived from the 
protein data base using the distance criteria only. 

2. Selecting Candidates with Prop er Direction from N 
Terminal to C Terminal 

This substep essentially creates a second group of 
possible candidates from the first group of possible 
candidates which was produced by the distance selec- 
tion substep discussed in connection with Figure .9. 
The second group of candidates is selected in accord- 
ance with the orientation of the C terminal residue 
( i.e. the final residue) of the linker with respect to 
the N terminal residue (i^ .the initial residue) 
which is compared to the orientatioa of the C terminal 
residue (i^ the second site) of the gap with respect 
to the N terminal residue ( i.e. the first site). See. 
Figure 20B. In this way. this direction evaluation 
determines if the chain of the linker ends near the 
second site of the gap, when the amino terminal amino 
acid of the linker is superimposed on the first site 
of the gap so as to produce the minimum amount of un- 
wanted molecular distortion. 

Referring now to Figure 10, the first step used in 
producing the second group of possible candidates is a 
step 1002. In step 1002 a local coordinate system is 
established on the N terminal residue of one of the 
selected gaps. For example, one might take the local 
X-axis as running from the first alpha carbon of the N 
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terminal residue to the second alpha carbon of the N 
terminal residue/ with the first alpha carbon at the 
origin - the second alpha carbon on the plus X-axis, 
The local Y-axis is selected so that the carbonyl oxy- 
gen lies in the xy plane with a positive y coordinate. 
The local Z-axis is generated by crossing X into Y. 
Next, as indicated by step 100 4, a standard reference 
point in the C terminal residue of the gap is located 
and its spherical polar coordinates are calculated in 
the local system. The standard reference point could 
be any of the atoms in the C terminal peptide 
(throughout this application, peptide, residue, and 
amino acid are used interchangeably) or an average of 
their positions. Steps 1002 and 1004 are repeated for 
all gaps in th« list of gaps. As indicated by step 
1006, a local coordinate system is established on the 
N terminal residue of one of the candidates. This 
local coordinate system must be established m the 
same manner used for the local coordinate systems es- 
tablished on each of the gaps. Various local systems 
could be used, but one must use the same definition 
throughout. In step 1008, the standard reference 
point is found in the C terminal residue of the cur- 
rent candidate. This standard point must be chosen in 
the same manner used for the gaps. The spherical pol- 
ar coordinates of the standard point are calculated in 
the local system of the candidate. (This use of local 
coordinate system is completely equivalent to rotating 
and translating all gaps and all candidates so that 
their initial peptide lies in a standard position at 
the origin.) In step 1010, the spherical polar coor- 
dinates of the gap vector (r, theta, phi) are compared 
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„ tKe spherical polar coordinates of the =a.drdate 
vector <r, theta, phi.. In step 1012 a preset thresh- 
h ld is applied, if the t»o vectors = ^fj^ 

enough, then one proceeds to step "" -^ -^"^j^f,! 
candidate in the second group of candidates urre 
iv this preset threshhold is set to 0.5 A. d 
vIluL ould he used. .ro. step 1014. one s.rps for- 

to step 10... :.i^- - nor 1:1 

the vectors compared m step 1012 are. 
enough, one moves to the next gap vector .n the list. 
In sU 1016. If there are no more ^^^'^^^^^^ 
step 1018 where the candidate is rejected. the 

. a^os step 1O20 increments the gap counter 
are more gaps, step j-w*" im A ot 1018 

and one returns to step 1010. Fro. steps 1014 or 1018 
t step 1022 where one tests to see ir aj-j. 
Zdl^tls have been examined. If not. ^^^P^"^* ;"- 
crements the candidate counter and one returns to St p 
lOO-S. If all candidates have been examned. one 

finished, step 1026. 

Pigure U Shows the concept of con,par.ng the d. 
re^tlon of the gap to the direction of the candrdate 

The inventors have determined that in the example 
discussed above where 761.50S possible candidates a.e 
in the protein data base, the winnowing process n 
this step reduces the approximate 63,727 candidates .n 
the first group to approximately 50 candidates rn the 
second group. The inventors have also ^e"rmrned that 
as referenced to the units of computer t.me referred 
tc above in connection .ith the scalar ; f 

„,ter, it taices approximately 4 tc 5 computer unxts o. 
to oerform the selection of this s.ep. Thus, 
be a'ooreciated that it preserves computer t.me to 



wo 88/01649 



-49- 



- ♦ » « • « t t 

PCT/US87/02208 



perform the distance selection firsts and the direc- 
tion selection second since the. direction selection 
process takes more time than the distance selection 
process* 

3. Selecting Candidates with Proper Orientation 
at Both Termini 

In this step/ the candidates in the second group 
of step 1016 of Figure 10 are winnowed down to produce 
a third group of plausible candidates using an evalua- 
tion of the relative orientation between the peptide 
groups at either end of the candidate, compared to the 
relative orientation between the peptide groups at 
either end of the gap. In a step 1201, (Figure 12) 
decide that a peptide will be represented by 3, 4, or 
5 atoms ( vide infra) > Specifically, in a step 1202, 
one of the candidates in the second group (step 1014) 
is selected for testing. in a step 1204,. three to 
five- atoms in the first peptide are selected to define 
the orientation of the first peptide. So long as the 
atoms are not collinear, three atoms is enough, but 
using four or five atoms makes the least-squares pro- 
cedure which follows over-determined and therefore 
compensates for errors in the coordinates. For exam- 
ple, assume selection of four atoms: C alpha , C, 
and C beta. Next, in a step 1206, one selects the 
corresponding 3,4, or 5 atoms from .the final peptide 
of the selected candidate. These 6, 8, or 10 atoms 
define a three-dimensional object. In a step 1208, 
select one of the gaps. Select the corresponding 6, 
8, or 10 atoms from the gap. In a step 1210, least- 
squares fit the atoms from the candidate to the atoms 
from the gap. This laast-squarss fit allows degrees 
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i-h(a two three-dimensional 
of freedom to supetunpose the two t 
obiects. Assur,e that d-e object 15 f^xed and . the 
othe "is free to .ove. Three decrees of freedom con- 
trol the movement of the center of the free object. 
Trie^ther degrees of freedom control the orientation 
Tthe free object. In a step 12U, the result of the 
fea t!s^„are fit is examined. If the .oot-«ean-S,uare 
error is less than some preset J ^ 

the candidate is a good fit for the gap berng consx 
Irel and is enrolled in the third group .n a step 
If, on the other hand, the RMS error is greater 
tran'the preset threshhold. one - ^ 

is another gap in the list in a step 1216. If there 
is, one selects the next gap and returns to step 1 08. 
Xf there are no more gaps in the J 
rent candidate from .the second group .s -3-^=^ - 
.tep 1218. in step 1220, bne checks to see .f there 
are more candidates in the second group, rf so, a new 
candidate is selected and one returns to ^ 
I. there are. no more candidates, one is finished (st.p 
i;22,. .gain referring to a 

Where linkers of length up to twenty amino acids wer 
sought for a stngle gap with 

protein data ban. contained 761,905 P"'-"-^^^"^^"" 
Of these, 63,727 passed the distance test. The direc 
tioh test removed all but 50 candidates. The orien 

A „niv 1 candidate with RMS error 
tation test passed only 1 canaiaa ,.,,ition- 
less than or equal to 0.5 A. There were two addition 
al candidates with BMS error between 0.5 . -o .6J. 
Moreover, the lnve.ntors have determined tna- 1. takes 
about 25 units of computer time t= evaluate eacn can- 
,<aate in croup 2 to decide whether they shoula be 
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selected for group 3, It can be appreciated now that 
the order selected by the inventors for the three 
steps of winnowing the candidates has been selected so 
that the early steps take less time per candidate than 
the following steps. The order of the steps used to 
select the candidate can be changed/ however, and 
still produce the desired winnowing process* Logical- 
ly, one might even omit steps one and two and pass all 
candidates through the leas t- squares process depicted 
in Figure 12 and achieve the same list of candidates, 
but at greater cost in* computing- This may be done in 
the case of parallel processing where computer time is 
plentiful, but memory is in short supply- 

Another approach (not illustrated) for determining 
whether the proper orientation exists between the ends 
of the candidate, is to examine only the atoms at the 
C terminal of the candidate as compared to the atoms 
at the final peptide of the gap. In step 2, the in- 
ventors aligned the first peptide of the candidate 
with the first peptide in the gap. Having done this, 
one could merely compare the atoms at the C terminal 
of the candidate with the atoms of the second peptide 
of the gap. This approach is inferior to that discus- 
sed above because all the error appears at the C ter- 
minus, while the least-squares method discussed above 
distributes the errors evenly. 

Ranicinq and Sliadnatiag Candidates , 

As shown in Figure 3, the third general step in 
the present invention is that of ranking the plausible 
candidates from most plausible to least plausible, and 
eliminating those candidates that do not appear to be 
plausible based on criteria utilized by an expert 
operator and/or expert system. 
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in the best mode, the candidates in the third 
aroup (Step 1214) are provided to the expert operator, 
: : =al sluentiall. display the. in three ^.ehS.ons 
utilising the =o^uter-,raphi=s display syst^ 116. 
The expert operator then can »al.e 

candi^tes hased on .nowled^e conoernin, PJ-- f - 
istry and the physical relationship of the Pl--^^^ 
olndldate with re.pect to the gap heln, ^^^^^^^^ 
Talysis can be used to ran. the plausible candidates 

-ird ,roup fro. .ost plausible to -ast p a s 
ible Based on these rankings, the most plausible 
candidates can be selected for genetic engineering 
' noted above in connection with the illustrative 

exa. le. there are typically few .under 100, can i- 
aates Which ma>ce it to the third group of s.ep 1214- 
consequently, a moderately expert operator ^o.^J^^^ . 
a Bachelor of Science degree in chemistry, for exg^ 
ole) can typically winnow down this number of plaus 
! ! 'cIIdidaLs to a group of 10 to 15. ^^^^^^^ 
.ore expert operator and/or expert syste. -'^^^ 
„inno« down the number. In this way, only a ve^ few 
of the plausible candidates needs to be tested in 
practice as compared to the hundreds, thousands o 
Lre of candidates that would have to be tested if no 
selection process li3ce that of the present invention 
used. This speeds up the process of engineering 
the single chain .olecules by orders of -^-'^J^- 
while reducing costs and other detriments by orders of 

maanitude as well. ^^nV- 

^-^^nc; however, autcmacic ranj^ 

.;nc In this third general step .ay be warranted. This 
co^ld occur, for example, where the expert operator 
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was presented with quite a few candidates in the third 
group, or where it is desired to assist the expert 
operator in making the ranking selections * and elimin- 
ating candidates based on prior experience that has 
been derived from previous engineering activities 
and/or actual genetic engineering experiments • 

Referring now to Figure 13, a coordinate' listing 
of the hypothetical molecule (candidate) is automati- 
cally constructed, as is indicated by a block 1302. 
The expert operator can then display using a first 
color the residues frpm domain 1 of the native pro- 
tein. Color display 120 can provide a visual indi- 
cation to the *»xpert operator of where the residues 
lie in domain 1. This is indicated by a block 1304. 

The expert operator then can display on color dis- 
play 120 the residues from domain 2 of the -native pro- 
tein using a second color, as is indicated by a block 
1306. The use of a second color provides a visual 
indication to the user which assists in distinguishing 
the residues from domain 1 from the residues from 
domain 2. 

The linker (candidate) being ranked can be dis- 
played in a selected color, which color can be differ- 
ent from the first color of step 1304 and/or the sec- 
ond color from step 1306. Again, by using this visual 
color indication, the expert operator can distingxaish 
the residues of domain 1 and 2 of the native protein. 
This display of the linker candidate is indicated by a 
block 1308. 

The initial picture on the color display 120 pro- 
vided to the exnert operator typically shows the alpha 
carbons for all of the residues. This is indicated by 
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a block 1310. in addition, the initial picture shows 
the main-chain and side-chains for residues and lin- 
kers and one residue before the linker and one residue 
after the linker. This is indicated by a block 1312. 

The expert operator can also cause any of the 
other atoms in the native protein or linker candidate 
to be drawn at will. The molecule can be rotated, 
translated, and enlarged or reduced, by operator com- 
mand, as was discussed generally in connection with 
the computer-graphics display system 116 above. The 
block diagram of Figure 13 indicates that each of the 
steps just discussed are accomplished in serial fash- 
ion. However, this is only for purposes of illustra- 
tion. It should be understood that the operator can 
accomolish any one or more of these steps as well as 
other steos at will and in - any sequence that is de- 
. sired in connection with the ranking of the plausible 

candidates in group 3. 

The exoert ooerator and/or expert system utilized 
in this third general step in ranking the candidates 
from most plausible to least plausible and in elimin- 
ating the remaining candidates from group 3, can use a 
number of different rules or guidelines in this selec 
tion process. Representive of these rules and guide- 
lines are the following which are discussed in connec- 
tion with Figure 14. Note that the blocks in Figure 
14 show the various rules and/or criteria, which are 
not necessarily utilized in the order in which the 
boxes aooear. The order shown is only for purposes or 
illustration. Other rules and/or criteria can be 
utilised in the ranking process, as well. 
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As shown in step 1402, a candidate can be rejected 
if any atom of the linker comes closer than a minimum 
allowed separation to any retained atom of the native 
protein structure. In the best mode^ the minimum al- 
lowed separation is set at 2.0 Angstroms, Note that 
any other value can be selected. This step can be 
automated, if desired, so that the expert operator 
does not have to manually perform this elimination 
process. 

A candidate can be penalized if the hydrophobic 
residues have high exposure to solvent, as is indicat- 
ed by a block 1404. The side chains of phenylananine, 
tryptophan, tyrosine, leucine, isoleucine, methionine, 
and valine do not interact favorably with water and 
are called hydrophobic. Proteins normally exist in 
saline aqueous solution; the solvent consists of polar 
molecules (H2O) and ions. 

A candidate gan be penalized when the hydrophilic 
residues have low exposure to solvent. The side 
chains of serine, threonine, aspartic acid, glutamic 
acid, asparagine, glutamine, lysine, arginine, and 
proline do interact favorably with water and are 
called hydrophilic. This penalization step for hydro- 
philic residues is indicated by a block 1406- 

A candidate can be promoted when hydrophobic resi- 
dues have low exposure to solvent, as is indicated by 
a block 1408. 

A candidate can be promoted when hydrophilic resi- 
dues have high exposure to solvent, as indicated by a 
block 1410. 

A candidate can be penalized when the main chain 
fails to form hydrogen bonds, as is indicated by a 
block 1412- 
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A ca-didate ca.. ^ penalized when the min chain 
„a.es useless e^c^sicns into the solvent «g.on . 
useless excursions are those which do not maice any 
evident interaction with the retained natxve prote.n. 
This is indicated by a block 1*14. 

^ candidate can he promoted when the .a.n cha. 
foms a helix, as is indicated by a block 
ices are self-stabilizing. Thus a Unker wh.ch 
helical will be ^re stable because ^^^ '^"'^ 
,Lr atc«s CO and m will for. hydrogen bonds w.th.n 

the linker. candidate can 

&s is indicated by a block l4io, a 
be promoted when the ^in chain f onns a beta sheet 
rhic'h fits against existing beta sheets ^e str 
cf beta sheets 

found which was in a beta sheet CO ^ 

it would extend an existing beta sheet, this inter 

actiog would stabilize both the linker and the native 

'"'tiler expert design rule penalizes candidates 
Which have sterlcally bulky side chains at -^"-^^ 
positions along the «ain chain. Furthermore it s 
possible to "save- a candidate with a bulky side chain 
Treplacing the bulky side chain by a less bulky one. 
L ia.ple if a side chain carries a ^ulky subst tu- 
ent such as leucine or isoleucine, a Possible design 
step replaces this amino acid by a glycine, which 
the least bulky side chain. j ,„ ,.h. 

Other rules and/or criteria can be u^^^--^ ^ ^ 
selection orocess of the third general st.p 306, and 
th"e orese^t invention is not Un.ited to the rules 
and/o'r criteria discussed. For example, once the 
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linker has been selected it is also possible to add, 
delete, or as stated, modify one or more amino acids 
therein, in order to accomplish an even better 3-D 
fit, 

IV. Double and Multiple Linker Bnbodlnents 

Section III above described the single linker em- 
bodiment in accordance with the present invention. 
This section describes double linker and multiple lin- 
ker embodiments in accordance with the present inven- 
tion. For brevity purposes, only the significant dif- 
ferences between this embodiment and the single linker 
embodiment will be described here and/or illustrated 
in separate figures. Reference should therefore be 
made to the text and figures that are associated with 
the single linker embodiment 

A. Plausible Site Selection . 

The two main goals of minimizing distance between 
the sites to be linked and the least loss of native 
protein apply in the site selection in the double and 
multiple linker embodiments as they did apply in the 
single linker embodiment discussed above. 

Figure 15A shows a simplified two dimensional rep- 
resentation of the use of two linkers to create the 
single polypeptide chain from the two naturally aggre- 
gated but chemically separate polypeptide chains. 
Figure 15B shows in two dimensions a three dimensional 
representation of the two chains of Figure 15A. Refer- 
ring now to Figures 15A and B, the first step in de- 
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tenuiaing suitable sites is to find a site .n dornazn 1 
which is close to either the C or N terminus of domaxn 
2 For purposes of illustration, and as is shown in 
Figures ISA and 15B, it is assumed that the most pro- 
n^ising location is the C terminus of domain 2. The 
residue in domain 1 is called Tau 1, while the residue 
in domain 2 is called Sigma 1. 

Figures 16A and 16B are respectively two dimen- 
sional simplified plots of the two chains, and two 
dimensional plots of the three dimensional representa- 
tion of the two chains. They are used in connection 
with the explana'tion of how plausible sites are selec- 
ted for the second linker in the example situation. 

The first s^.ep in connection with finding plausi- 
ble sites for the second linker is to find a residue 
in domain 1 that is before Tau 1 in the light chain. 
This residue is called residue Tau 2. It is shown in 
the top portion in Figure 16A, and in the right middle 
portion in Figure 16B. ^ 

The next step in the site selection process .or 
the second linker is to find a residue in domain 2 
near the N terminus of domain 2. This residue is 
called residue Siama 2. Reference again is made to 
Figures 16A and B to show the location of Siama 2. 

The second linker (linker 2) thus runs from Tau 2 
to Sisma 2. This is shown in Figures 17A and 17B. 
Note, that the chain that is formed by these two lin- 
kers has the proper direction throughout. 

Figure 18 shows in two dimensional simplified form 

u^^r, 'n;as he"::"! formed by 

the single polypeptide chain ^aa'_ nas De-n 

the liaking\f the two independent chains using the 

two linkers. >.cte that the approach" outlined above 



resulted in the minimal loss of native protein. . The 
completely designed protein is shown in Figure 17 and 
consists of domain 1 from the N terminal to Tau 2, 
linker 2, domain 2 from Sigma 2 to Sigma 1, linker 1, 
and domain 1 from Taul to the C terminus. The arrows 
that are shown in Figure 17 indicate the direction of 
the chain. 

Figure 17 shows that the residues lost by the 
utilization of the two linkers are: (a) from the N 
terminus of domain 2 up to the residue before Sigma 2; 
and (b) from the residue after Sigma 1 to the C termi- 
nus of domain 2; and (c) from the residue after Tau 2 
to the residue before Tau 1 of domain 1. 

If one of the linkers in the two linker case is 
very long, one could link from Tau 2 to a residue in 
domain 2 after Sigma - 1. A • third linker (not shown) 
would then be sought from a residue near the C termi- 
nal of domain 2 to a residue near the N terminal of 
domain 2. 

Additionally, one could use two linkers to recon- 
nect one of the domains in such a way that a single 
linker or a pair of linkers would weld the two domains 
into one chain. 

Candidate Selection and Candidate Rejec - 

tion Steps 

Ranking of linkers in the multilinker cases fol- 
lows the same steps as in the single linker case ex- 
cept there are some additional considerations, 

(1) There may be a plurality of linkers for 
each of the two (or more) gaps to be closed. One must 
consider all combinations of each of the linkers for 
gap A with each of the linkers for gap B, 
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(2) one must consider the interactions be- 

tween linkers. . . ^i. 

AS one must consider combinations of linkers, the 
.anting of individual linkers is used to cut down to a 
s.all number of very promising linkers for each gap. 
If ^one has only three candidates for each gap, there 
ar^ nine possible constructs. 

The process of examining interactions between l.a 
3cers and discarding poor candidates can be automated 
by applying the rules discussed above. 

V T>nT-;.Tlel Pror^^^saina Embodimgnt 

pigure 19 shows in block diagram form the parallel 
processing approach that can be utilized in the pres- 

ent invention. ^^^^^^ 
xs sh=™ in Figure 19, a friendly serxal processor 
1902 is connected by a first bus 1904 to a pluraUty 
of data storage devices and input devices. Specific- 
ally, and only for purposes of illustration, a tape 
inpit stage 190 6 is connected to bus 190 4 so as to 
reL into the syste=„ the parameters of the protern 
dita base that is used. . high storage ^^^/^^J 
system 1908 (having, for example, 5 gigabits or 
storage, is also connected to 
operationally, for even larger storage capab.Ut.es 
an optical disk storage stage 1910 of conventronal 
design can be connected to bus 1904. 

The goal of the hypercube 1912 that is coonected 
to the friendly serial processor 1902 via a bi-direc- 
tional bus 1914 is twofold: to perform searching ras- 
and to throw out candidates more automatically. 



wo 88/01649 PCT/US87/02208 

-61- 

The hypercube 1912, having for example, 2^^ to 2"^^ 
nodes provides for parallel processing. There are 
computers currently available which have up to 1,0 24 
computing nodes. Thus each node would need to hold" 
only about 1400 candidate linkers and local memory of 
available machines would be sufficient. This is the 
concept of the hypercube 1912. Using the hypercube 
parallel processing approach, the protein data base 
can be divided into as many parts as there are compu- 
ting nodes- Each node is assigned to a particular 
known protein structure. 

The geometry of the gap that has to be bridged by 
a linker is sent by the friendly serial processor 1902 
via bus 1914 to the hypercube stage 1912. Each of the 
nodes in the hypercube 1912 then processes the geome- 
trical parameters with respect to the particular can- 
didate linker to which it is assigned. Thus, all of 
the candidates can be examined in a parallel fashign, 
as opposed to the serial fashion that is done in the 
present mode of the present invention. This results 
in much faster location (the inventors believe that 
the processing speed can be brought down from 6 hours 
to 3 minutes using conventional technology) in locat- 
ing the candidates that can be evaluated by the second 
step 304 of the present invention. 

Another advantage for the parallel processing em- 
bodiment is that it will provide sufficient speed to 
allow candidates to be thrown out more automatically. 
This would be achieved using molecular dynamics and 
energy minimization- While this could be done cur- 
rently on serial processing computers (of the super 
computar variety such as those manufactursd by Cray 
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V .V, n^rallel processing approach will per- 
and cyber) the parallel p „,inimizatioa 

inexpensive ccputxn, nodes «h.= ^ 
favorably to supercoep.ters ^^l^J!^'' „„i. 
Molecular dynamics and "---■^^^ 
oartly vectorijable because the potentLal 
: ed Le numerous data-dependent 

.escribed 'U^, ..erefor. .iv- 

etic code, to genetic 

* rode, however, tnere 

many instances u t- . -..^.^s which are 

Therefore, codon usage rv^es, 

Tsl^en .de:.tood by tbose o. ^^^^:Z:Z 

- utili.ed .o. ----- i Idt .see, 

'""i:;rally. it is possible to u.ili.e tbe^oO». - 

,.nces Obtained J^^^t^ra. ^ ^ 

the variable region or the or 9 

starting point. These '7" ^tide Unlcer 

„eans of genetic lingers '^ '^ /vention. 

oendidates elucidated ''^T - ^t,:,,,,., ^ 

rvorL::..^::; :aV:iun.d together .ith 

as describee. 



A large source of hybridomas and their correspond- 
ing monoclonal antibodies are available for the pre- 
paration of sequences coding for the H and L chains of 
the variable region.. As indicated previously, it is 
well Icnown that most "variable" regions of antibodies 
of a given class are in fact quite constant in their 
three dimensional folding pattern, except for certain 
specific hypervariable loops. Thus, in order to 
choose and determine the specific binding specific- 
ity of the single chain binding protein of the inven- 
tion it becomes necessary only to define the protein 
sequence (and thus the underlying genetic sequence) of 
the hypervariable region. The hypervariable region 
will vary from binding molecule to molecule, but the 
remaining domains of the variable region will remain 
constant for a given class of antibody. - 

. Source mRNA can be obtained from a wide range of 

-3- 

hybridomas. See for exampfe the catalogue ATCC Cell 
Lines and Hybridomas , Dec ember 19 8 4, American Type 
Culture Collection, 20309 Parklawn Drive, Rockville, 
Maryland 20852, U.S.A., at pages 5-9. Hybridomas se- 
creting monoclonal antibodies reactive with a wide 
variety of antigens are listed therein, are available 
from the collection, and usable in. the invention. Of 
particular interest are hybridomas secreting antibod- 
ies which are- reactive with viral antigens, tumor as- 
sociated antigens, lymphocyte antigens, and the like. 
These cell lines and others of similar nature can be 
utilized to copy mRNA coding for the variable region 
or determine amino acid sequence from the monoclonal 
antibody itself. The specificity of the antibody to 
be engineered will be determined by the original se- 
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lection process. The class of antibody can be deter- 
mined by criteria Xnown to those skilled in the art. 
If the class is one for which there is a three-d^en- 
sional structure, one needs only to replace the se- 
quences of the hyper-variable regions (or complemen- 
tary determining regions). The replacement sequences 
will be derived from either the amino acid sequence or 
the nucleotide sequence of DUA copies of the mRNA. 

It is to be specifically noted that it is not ne- 
cessary to crystallize and determine the 3-D struc- 
ture of each variable region prior to applying the 
method of the invention. As only the hypervar.able 
loops change drastically from variable region to vari- 
able region (the remainder being constant in the 3-D 
structure of the variable region of antibodies of a 
given class), it is- possible to generate many • single 
chain 3-D structures from structures already known or 
to be determined-%or each class of antibody. . 

For example, linkers generated in the Examples m 
this aoplication (e.g., TRY40 , TRY51 or TRY59, see 
below) 'are for Fv regions of antibodies of the IgA 
class. They can be used universally for any antibody, 
having any desired specificity, especially if the 
antibody is of the IgA class. ■ 

Expression vehicles for production of the mole- 
cules of the invention include plasmids or other vec- 
tors, in general, such vectors containing replicon 
and control sequences which are derived from species 
compatible with a host cell are used in connection 
with the host. The vector ordinarily carries a repli- 
con site, as well as specific genes which are capanle 
of providing phenotypic selection in transformed 
C.11S. For examole, E. coli is raadily transformed 



using pBR322, a plasmid derived from an E, coli spe- 
cies, pBR322 contains genes for ampicillin and tetra- 
cycline- resistance, and thus provides easy means for 
identifying transformed cells. The pBR322 plasmid or 
other microbial plasmids must also contain, or be mod- 
ified to contain, promoters which can be used by the 
microbial organism for expression of its own proteins. 
Those promoters most commonly used in recombinant DNA 
construction include the beta lactamase^ lactose pro- 
moter systems, lambda phage promoters, and the trypto- 
phan promoter systems. While these are the most com- 
monly used, other microbial promoters have been dis- 
covered and can be utilized- 

For example, a genetic construct for a single 
chain binding protein can be placed under the control 
of the leftward promoter of bacteriophage lambda. 
This promoter is one of the' strongest known promoters 
whiclf can be controlled. Corftrol is exerted by the 
lambda repressor, and adjacent restriction sites are 
known . 

The expression of the single chain antibody can 
also be placed under control of- other regulatory se- 
quences which may be homologous to the organism in its 
untransf ormed state. For example, lactose dependent 
E, coli chromosomal DNA comprises a lactose or lac 
operon which mediates lactose utilization by elabora- 
ting the enzyme beta-galactosidase. The lac control 
elements may be obtained from bacteriophage lambda 
plac5, which is infective for E. coli . The lac promo- 
ter-operator system can be induced by IPTG, 

Other prpraoter/cperator systems or portions there- 
of can be employed as well. For example, colicin El, 
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galactose, allcaline phosphatase, tryptophan, xylose, 

tac, and the like caa be ased. „ . 0 /P 

Of particular interest is the use c£ the 0 /P 
hybrid la^da promoter (see for example O.S patent 
application Serial «u.ber 534.982 filed September 3, 
X983, and herein incorporated by reference), 

ither preferred hosts are ^^Uan cells, grown 
in vitro in tissue culture, or in_«vo in aniaals. 
I^^^n cells provide post translational eod.£.ca- 
tions to immunoglobulin protein molecules including 
correct folding or glycosylation at correct sites 

Mam^lian cells which n^y be useful as hosts in 
elude cells of fibroblast origin such as VERO or 
CHO-Kl, or cells of lymphoid origin, such as the hy- 
bridoma SP2/0-!.G14 or the myeloma P3x63Sg8, and their 

derivatives. for 
several possible vector systems are available for 
the exoression of cloned singre chain binding proteins 
1„ ma^^lian cells. One class of vectors ™ 
elements which provide autonomously replica .ing extra 
Chromosomal plasmids. derived from animal viruses such 
as bovine papilloma virus, polyoma virus, or SV40 vir 
us. A .second class of vectors relies upon the inte- 
gration of the desired gene sequences into the host 
cell Chromosome. Cells which have stably integrated 
the Introduced DiK into their chromosomes can be se- 
lected by also introducing drug resistance genes such 
as E. coli GPT or TnSneo. The selectable marker gene 
can^rS;; be directly linked to the D»A gene sequen- 
ces to be e.-cpressed, or introduced into the same cell 
hv co-transi action. .=.dditional eleme.nts may also be 
„;.ded- for ootimal synthesis of single chain binding 
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protein mRNA. These elements may include splice sig- 
nals, as well ^s transcription promoters, enhancers, 
and termination signals. cDNA expression vectors in- 
corporating such elements include those described by 
Okayama, H., Mol, Cel. Biol ., 3:280 (1983), and 
others. 

Another preferred host is yeast. Yeast provides 
substantial advantages in that it can also carry out 
post translational peptide modifications including 
glycosylation. A number of recombinant DNA strategies 
exist which utilize strong promoter sequences and high 
copy number of plasmids which can be utilized for pro- 
duction of the desired proteins in yeast. Yeast re- 
cognizes leader sequences on cloned mammalian gene 
products, and secretes peptides bearing leader sequen- 
ces (i.e., pre-peptides ) . 

Any of a series of yeast gene expression systems 
incorporating promoter and termination elements from 
the actively expressed genes coding for glycolytic 
enzymes produced in large quantities when yeasts are 
grown in mediums rich in glucose can be utilized. 
Known glycolytic genes can also provide very efficient 
transcription control signals. For example, the pro- 
moter and terminator signals of the phosphoglycerate 
kinase gene can be utilized. 

Once the strain carrying the single chain building 
molecule gene has been constructed, the same can also 
be subjected to mutagenesis techniques using, chemical 
agents or radiation, as is well known in the art. 
From the colonies thus obtained, it is possible to 
search for those producing binding molecules with in- 
creased binding affinity. In fact, if the first lin- 
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^er designed with the aid of the ccn^ater faUs to 
produce a. active molecule, the host strain conta.nin, 
Z sa,ne can „.ta,eni«d. Mutant molecules capable 
of binding antigen can then be screened by means of a 

X'essed and refolded single chain binding 
proteins of the invention can be labelled with detect- 
Lle labels such as radioactive atoms, enzymes, b.c 
tin/avidin labels, chromophores , chem.luminescent 
labels, and the U^e for carrying out standard xmmuno- 
diagnostic procedures. These procedures i-^"^-^-" 
petitive and Irmiunometric (or sandwich) assays. These 
assays can be utilized for the detection of antigens 
diagnostic samples. In competitive and/or sa-ndw.c 

assays, the binding proteins of the ^'^^^^^''^^I'^j'^ 
be immobilized on such insoluble solid phases 
beads, test tubes, or other polymeric materials. 

ror imaging procedures, the binding molecules of 
the invention can be labelled with opacifying agents 
such as SME contrasting agents or X-ray contrasting 
agents. Methods of binding, labelling or imaging 
agents to proteins as weU as binding 
insoluble solid phases are well ).nown in the art. The 
refolded protein can also be used for therapy whe 
labelled or coupled to enzymes or toxins, and for 
purification of products, especially those produced by 
the biotechnology industry. The proteins can also be 
used in biosensors- 

Havina now generally described this invention the 
samo w^-i^'be better understood by reference to certain 
soe^ific examnles which are included for purposes or 
illustration and are not intended to be limiting un- 
less otherwise specified. 
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EX AMPLE S 

In these experiments, the basic Fv 3-D structure, 
used for the computer assisted design was that of the 
anti-phosphoryl choline myeloma antibody of the IgA 
class, MCPC-603- The X-ray structure of this antibody 
is publicly available from the Brookhaven data base. 

The starting material for these examples was 
monoclonal antibody cell line 3C2 which produced a 
mouse anti-bovine growth hormone (BGH). This antibody 
is an IgG^ with a gamma 1 heavy chain and kappa light 
chain. cDNA's for the heavy and light chain sequences 
were cloned and the DNA sequence determined. The nu- 
cleotide sequences and the translation of these se- 
quences for the mature heavy and mature light chains 
are shown in Figures 21 and 22 respectively. 

Plasmids which contain just the variable region of 
the heavy and light chain sequences were prepared. A 
Clal site and an ATG initiation codon ( ATCGATG ) were 
introduced before the first codon of the mature se- 
quences by site directed mutagenesis, A Hind lll site 
and termination codon ( TAAGCTT ) were introduced after 
the codon 123 of the heavy chain and the codon 10 9 of 
the light chain. The plasmid containing the se- 
quences is pGX3772 and that containing the is 
pGX3773 (Figure 23 ) . 

The examples below were constructed and produced 
by methods known to those skilled in the art. . 
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EXAMPLE 1 
A. rnm puter Desiga 



^ t«;:5;^^;r^le (referred « as TS. AO, , was 
desianed by the following steps. 

lirst, it was observed that U,ht chains were .uch 
«3ier to .a.e in than were ""'^ 

was thus decided to start with light chain (In the 
future, one could certainly .a.e examples ^^^^^ 
with heavy Chain because there is a very similar c^n 
tact between a turn in the heavy Cham and the exit 
strand of the light chain. ) 

Refer to stereo Figure 301., which shows the light 
and heavy domains of the .v fro. M0.C-»3 antibody: 
the conlnt dentins are discarded. . 
Z alpha carbons of the light chain is above an^ 
dashed. ^e a.ino teminus of the light ° 
the bad. and at about 10 o-cloc. fro. the picture 
center and is labeled At the right edge ^ e 

picture, at about . Ccloc. is an ^ ^ t 

oath toward the constant domain. Below .h g 
chain is a line Joining the alpha carbons of the heavy 
chain. The amino terminus of the heavy chain s 
toward the viewer at about 7 o'clock and is al- 
labeled -N." At about 4=30, one sees an arrow showing 
the heavy chain path to its constant domain 

The antigen-binding site is to the left, about 9 
o. clock and between the two loops which project to the 
right above (li-.ht chain, and below (heavy chain, 

in addition to the alpha carbon traces, there are 
three se,me.nts in which all non-hydrogen 
be=n drawn. These strands are roughly parallel and 
from uooer right to lower left. They ars 
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(a) Proline 46 to Proline 50 of the light chain. 

(b) Valine 111 to Glycine 113 of the heavy chain. 

(c) Glutamic acid 1 to glycine 10 of the heavy 
chain. 

The contact between tryptophan 112 of the heavy 
chain and proline 50 of the light chain seems very 
favorable. Thus it was decided that these two resi- 
dues should be conserved. Several linkers were sought 
and found which would join a residue at or following 
Tryptophan 112 (heavy) to a residue at or following 
Proline 50 (light). Stereo figure 30B shows the re- 
gion around TRP 112H in more detail. The letter "r" 
stands between the side-chain of TRP 112H and PRO SOL; 
it was wished to conserve this contact. The letter 
"q" labels the carboxy terminal strand which leads 
towards the constant domain. It is from this strand 
that a linker will be found which will connect to PRO 
SOL. 

Once a linker is selected to connect 112H to SQL, 
one needs a linker to get from the first segment of 
the light chain into the beginning portion of the 
heavy chain. Note that PRO 46L turns the chain toward 
PRO SOL. This turning seemed very useful, so it was 
decided to keep PRO 46L. Thus the second linker had 
to begin after 46L and before 5 0L, in the stretch 
marked "s." A search for linkers was done beginning 
on any of the residues 46L, 47L, or 48L. Linkers be- 
ginning on residue 49L were not considered because the 
chain has already turned toward SOL and away from the 
amino terminal of the heavy chain. Linkers were 
sought which ended on any of the residues IH to lOH. 
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Figure 3QC shows the linked structure in detail. 
After TRP 112H and GLY 113H, was introduced the se- 
quence PRO-GLY-SER, and then comes PRO 50L. A com- 
puter program was used to look for short contacts be- 
tween atoms in the linker and atoms in the retained 
part of the Fv. There is one short contact between 
the beta carbon of the SER and PRO SOL, but small 

-I - « = This first linker , runs 
movements would relieve that. This 

from the point labeled "x- to the point labeled "y. 
The second linker runs from "v" to "w." Note that 
raost of the hydrophobic residues (ILE . and . VAL) are 
inside. There is a PHE on the outside. In addition, 
the two lysine residues and the asparagine residue are 
exposed to solvent as they ought to be. Figure 30D 
shows the overall molecule linked into a single chain. 
B Genetic Constructs 

These constructs were prepared and the plasmid^ 
containing them using E. coll . hosts. Once construc- 
ted, the sequences can be inserted into whichever ex- 
pression vehicle used in the organism of choice. 

The first construction was TRY40 (the two-lmker 
construction) which produces a protein with the fol- 
Lowing sequence: 

Met-CL-chain 1-41] -Ile-Ala-Lys-Ala-Phe-Lys-Asn-[H- 
chain a-105]-Pro-Gly-Ser-[L-chain 45-109]. The nucle- 
otide sequence and its translation are seen m Figure 
24. The hypervariable regions in TRY40 (as in TRY61 
5 9 and 104B, see below) correspond, as indicated, to 
an IgGl anti BGH antibody, even though the 3-D 
analysis was done on the Fv region of MCPC-503 anti- 
body, having a different specificity, (anti chcsohoryl 
choline) but having a similar framework in the vari- 
able region. 
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The antibody sequences in the plasmids pGX3772 and 
pGX3 773 were joined to give the. sequence of TRY40 in 
the following manner. The plasmids used contained an 
M13 bacteriophage origin of DNA replication, Whien 

* hosts containing these plasmids are superinf ected with 
bacteriophage M13 two types of progeny are produced, 

* one containing the single- strand genome and the other 
containing a specific circular single-strand of the 
plasmid DNa/ This DNA provided template for the oli- 
gonucleotide directed site specific mutagenesis ex- 
periments that follow. Template DNA was prepared from 
the two plasmids,- An EcoRI site was introduced before 

codon 8 of the V„ sequence in pGX3772, by site direct- 

n 

ed mutagenesis, producing pGX3772". Template from 
this construction was prepared and an Xba l site was 
introduced after codon 105 of the sequence produc- 
ing pGX3772' • . 

An Eco RI |ind an Xba l site were introduced into 
pGX37 73 between codons 41 and 45 of the V^^ sequence by 
site directed mutagenesis producing pGX3773'. 

To begin the assembly of the linker sequences 
plasmid pGX3773' (V^) DNA was cleaved with EcoRI and 
Xba l and treated with calf alkaline phosphatase. This 
DNA was ligated to the. Eco RI to Xba l fragment purified 
from plasmid pGX3772 ' ' (V^ ) which had been cleaved with 
the two restriction enzymes. The resulting plasmid 
pGX3774, contained the light and heavy chain sequences 
in the correct order linked by the Eco RI and Xbal re- 

^ striction sites. To insert the correct linker sequen- 

ces in frame, pGX3774 template DNA was prepared. The 

' EcoRI junction was removed and the linker coding for 

the -Ila-Ala-Lys-Ala-Phe-Lys-Asn- inserted by site 
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directed mutagenesis, producing plasmid pGX3774. 
Template DllA was. prepared from this construction and 
the Xbal site corrected and the linker coding for 
-Pro-'STy-Ser- inserted by site directed mutagenesis 
producing plasmid pGX3775. The sequence was found to 
be correct as listed in Figure 24 by DNA sequencing. 

in order to express the single-chain polypeptide, 
the sequence as a Clal to HindlH fragment was insert- 
ed into a vector pGX3703. This placed the sequence 
under the control of the hybrid lambda promoter 

(US. Patent Application 534,982, Sept, 23, 1983). 
The expression plasmid is pGX3776 (Figure 25). The 
■ plasmid PGX3776 was transformed into a host containing 
a heat sensitive lambda phage repressor; when grown at 
30°C the synthesis of the TRY4Q protein is repressed, 
synthesis was induced by. raising the temperature to 
42°C, and incubating for 8-16 hours. The protein was 
proluced at 7.2% of total cell protein, as estimated 
on polyacrylamide gel electropherograms stained with 
Coomas.sie blue- 

EXAMPLE 2 
A, Computer Design 

A one-linker example (referred to as TRY 61) was 
designed by the following steps. 

Refer to stereo Figure 31A which shows the light 
and heavy domains of the Fv; the constant domains are 
discarded. A line joining the alpha carbons of the 
light chain is dashed. The amino terminus or the 
1 ight chain is to the back and at about the center of 
the oicture and is labeled "11.- At the. right edge of 
the oicture, at about 2 o'clock is an arrow showing 
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the path toward the constant domain of the light 
chain. Below the light chain is a line joining the 
alpha carbons of the heavy chain. The amino terminus 
of the heavy chain is toward the viewer at about 9 
o'clock and is also labeled "N". At about 4:30, one 
sees an arrow showing the heavy chain path to its con- 
stant domain. 

In addition to the alpha carbon traces, there are 
two segments in which all non-hydrogen atoms have been 
drawn. These segments are the last few residues in 
the light chain and the first ten in the heavy chain. 
Linkers were sought between all pairs of these resi- 
dues, but only a few were found because these regions 
are widely separated. 

Figure 31B shows the linker in place. Note that 
the molecule now proceeds from the amino terminal of 
the light chain to the carboxy terminal strand of the 
heavy chain. Note also that the antigen-binding re- 
gion is to the left, on the other side of the molecule 
from the linker. 

B. Genetic Constructs 

The sequence of TRY61 (a single-linker embodi- 
ment) is Met-[L-chain 1-104 ] -Val-Arg-Gly-Ser-Pro-Ala- 
Ile-Asn-Val-Ala-Val-His-Val-Phe-[H-chain 7-123] . The 
nucleotide sequence and its translation are shown in 
Figure 26. 

To construct TRY61, plasmid pGX3772' DNA was 
cleaved with Clal and Eco RI and treated with calf al- 
kaline phosphatase. This DNA was ligated with the 
Cla l to Hindlll fragment from pGX3773 and two oligo- 
nucleotides which code for the linker sequence and 
have Hindlll and .EcoRI ends, so that the linker can 
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only be ligated in the correct orientation. The re- 
sulting plasmid, PGX3777, was used to prepare template 
DNA. This DNA was used for site directed mutagenesis 
to remove the Hindlll site inside the antibody sequen- 
ces. The correct construction, pGX3777' , was used to 
make template DNA for a site directed mutagenesis to 
remove the EcoRI site. The Clal to Hindlll fragment 
from the final construction, pGX3778, containing the 
T&Y61 coding sequence was confirmed by DNA sequencing. 
The Cla l to Hind lll was inserted into the pGX3703 ex- 
pression vector. This plasmid is called pGX4904 "{Fig- 
ure 27). This plasmid was transformed into an E. coli 
host. The strain containing this plasmid has been 
induced, and the single chain protein produced as >2% 
of total cell protein. 

EXAMPLE 3 
A. Computer Design 

A one-Linker example (referred to as TRY 59) was 
designed by the following steps. 

Refer to stereo Figure 32A which shows the light 
and heavy domains of the Fv; the constant domains are 
discarded. A line joining the alpha carbons of the 
light chain is above and dashed. The amino terminus 
of the light chain is to the back and at about 10 
o'clock from the center of the picture and is labeled 
"N". At the right edge of the picture, at about 2 
o'clock is an arrow showing the. path toward the con- 
stant domain of the light chain. Below the light 
chain is a line joining the alpha carbons of the heavy 
chain. The amino terminus of the heavy chain is to- 
ward the viewer at about 8 o'clock and is also labeled 
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"N*^. At about 4:30, one sees an arrow showing the 
heavy chain path to its constant domain, 

. In addition to the alpha carbon traces, there are 
two segments in which all non-hydrogen atoms have been 
drawn. These segments are the last few residues in 
the light chain and the first ten in the heavy chain. 
Linkers we sought between all pairs of these residues, 
but only a few were found because these - regions are 
widely separated. 

Figure 32B shows the linker in place- Note that 
the molecule now proceeds from the amino terminal of 
the light chain to the carboxy terminal strand of the 
heavy chain. Note also that the antigen- binding re- 
gion is to the left, on the other side of the molecule 
from the linker. 

The choice of end points in TRy59 is very similar 
to TRY61. Linkers of this length are rare. The ten- 
sion between wanting short linkers that fit very well 
and which could be found for the two-linker case 
(TRY40) and the desire to have only one linker, (which 
is more likely to fold correctly) is evident in the 
acceptance of TRY5 9. The linker runs from the point 
marked "A" in Figure 32B to the point marked "J." 
After five residues, the linker becomes helical. At 
the point marked "x," however, the side-chain of an 
ILE residue collides with part of the light chain. 
Accordingly, that residue was converted to GLY in the 
actual construction - 

B. Genetic Constructs 

The sequence of TRY5 9 (the single linker construc- 
tion) is Met-[L-chain 1-105 1 -Lys-Glu-Ser-Gly- Ser-Val- 
Ser-Ser-Glu-Gln-Leu-Aia-Gln-Phe-Arg-Ser-Leu-Asp-[H- 
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chain 2-1231. The nucleotide sequence coding for this 
amino acid sequence and its translation is shown m 
Figure 28, The Ball to Hindlll fragment (read clock- 
wise) fro. plasmid PGX3773 containing the V, sequence 
and the Clal to Ball fragment (clockwise) froia pGX3772 
has beenTigated with two oligonucleotides -^-^ 
a fragment containing the linker sequence J'^^' 
and have Clal and Hindlll ends. The Clal and Hindlll 
junctions within this plasmid are corrected by two 
successive site directed mutageneses to yield the cor- 
rect construction. The Clal to Hindlll fragment from 
this plasmid is inserted into the expression 
vector as in Examples 1 and 2. The. resulting plas- 
B.id, PGX4908 (Figure 29) is transformed into an ^ 
coU host. This strain is induced to produce the pro- 
tein coded by the sequence in Figure 28 (TRY59). 



Example 4 



A, Computer Design. 

in this design an alternative method of choosing a 
linker to connect the light and heavy variable regions 
was used. A helical segment from human hemoglobin was 
chosen to span the major distance between the carboxy 
terminus of the variable light chain and the amino 
terminus of the variable heavy chain. This alpha 
helix from human hemoglobin was positioned at the rear 
of the F model using the computer graphics system, 
care was"" taken to position the helix with its ends 
near the respective amino and carboxyl termini of the 
heavy and light chains. Care was also taken to place 
hydrophobic side chains in toward _the F^ and hydro- 
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phiiic side chains toward the solvent. The connec- 
tions between the ends of the variable regions and the 
hemoglobin helix were selected by the previouslx 
described computer method (EXAMPLE 1-3). 

B, Genetic Constructs 

The sequence of TRYl04b (a single linker construc- 
tion) is Met-[L-chain 1-10 6 l-Ala-Glu-Gly-Thr-[ (Hemo- 
globin helix) Leu-Ser-Pro-Ala-Asp-Lys-Thr-Asn-Val-Lys- 
Ala-Ala-Trp-Gly-Lys-Val-]Met-Thr-[H-chain 3-123] . The 
nucleotide sequence coding for this amino acid 
sequence and its translation is shown in Figure 33. 
The Bql l to Hin dlll fragment (read clockwise) from 
plasmid pGX3773 containing the sequence and the 

Cla l to Bql l fragment (clockwise) from pGX3772 has 
been ligated with two oligonucleotides which form a 
fragment containing the linker sequence for TRYip4b 
and have Cla l and Hindlll ends. The Cla l and Hin dlll 
junctions within this plasmid are corrected by two 
successive site directed mutageneses to yield the 
correct construction. The Clal to Hin dlll fragment 
front this plasmid is inserted into the ^j^/^^ expres- 
sion vector as in Examples 1-3. The resulting plas- 
mid, pGX4910 (Figxxre 34) is transformed into an E. 
coli host. This strain is induced to produce 'the pro- 
tein coded by the sequence in Figure 33 (TRYl04b). 

EXAMPLE 5 
Purification of the Proteins 

The single-chain antigen binding proteins from 
TRy40, TRY61, TRY59 and TRY104b are insoluble, and 
cells induced to produce these proteins show refrac- 
tile bodies called inclusions upon microscopic exami- 
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nation. Induced cells were collected by centrifuga- 
tion. The wet pellet was frozen on dry icer then 
stored at -20^C, The frozen pellet was suspended in a 
buffer and washed in the same buf f er^ and subsequently 
the cells were suspended in the same buffer. Uie 
cells were broken by passage through a French, pressure 
cellr aiid the inclusion bodies containing the single- 
chain antigen binding- protein (SCA) were purified by 
repeated centrifugation and washing. The pellet was 
solufailized in guanidine-HCl, and reduced with 
2-niercaptoethanol. The solubilized material was 

passed through a gel filtration column, i*e, , 
TM 

Sephacryl S-300. Other methods such as ion exchange 
could be used* . 

EXAMPLE 6 
Folding of the Proteins 

Purified material was dialyzed against water/ and 
the precipitate protein collected by centrifugation. 
The protein was solubilized in urea and reduced with 
2-mercaptoethanol. This denatured and solubilized 
material was dialyzed against a buffer containing salt 
and reducing agents to establish the redox potential 
to form the intra domain (one each for the light and 
heavy chain variable region ' sequences) disulfide 
bridges (Saxena and Wetlanfer, Biochem 9:5015-5023 
(1970)). . The folded protein was assayed for BGH bind- 
ing * activity. 

The TRY5 9 protein used in competition experiments 
was solubilized and renatured directly from inclu- 
sions. This material was subsequently purified by 
affinity to BGH-Sepharose. 
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EXAMPLE 7 

Binding- Assay 

BGH was- immobilized on nitrocellulose strips along 
with non-specific proteins 'such as bovine serum 
albumin or lysozymes. Further non-specific protein 
binding was blocked with an immunologically inert 
protein, for example gelatin. Folded SCA was tested 
for its ability to bind to BGH. The SCA was detected 
by a rabbit anti-L chain (of the monoclonal) 
anti-serum. The rabbit antibodies were reacted with 
goat anti-rabbit IgG coupled to peroxidase. The 
strips were reacted with chemicals which react with 
the peroxidase to give a color reaction' if the 
peroxidase is present. 

Figure 35 shows the result of this spot assay for 
TRY61 (strip 1) and TRY40 (strip 2). Strip 3 was 
stained with amido black to show the presence of all 
three proteins. The other proteins, TRY59, TRYl04b 
gave similar results in the spot assay. A competition 
assay with the SCA competing with the monoclonal can 
be used as well. The results of competing the F^j^ of 
3C2 monoclonal with 1 and 10 ug of TRY5 9 protein which 
had been affinity purified are shown in Figure 36 
( A alone, • f^b ^ TRY5 9, and • ^ab ^° 

TRY59). The affinity estimated from the Ic^q of this 
, experiment was approximately 10^- The data are 
summarized in Table 1. 
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WE CLAIM ; 

1. A single polypeptide chain binding molecxile 
which has binding specificity substantially similar to 
the binding specificity of the light and heavy chain 
aggregate variable region of an antibody. 

2. The molecule of claim 1 which comprises two 
peptide linkers joining said light and heavy chains 
into said single chain. 

molecule of claim 2 which comprises in 

an N-terminal region derived from said 
light chain; . 
a peptide linker; 

a peptide region derived from said heavy 
chain; 

a second peptide linker; and 
a C-terminal region derived from said 
light chain- 

4. The molecule of claim 1 which comprises one 
peptide linker joining said light and heavy chains 
into said single chain, 

5. The molecule of claim 4 which comprises, in 
sequence: 

(a) an N-terminal region derived from said 
light chain; 

(b) a peptide linker; a'nd 

(c) a C-terminal region derived from said 
heavy chain. 



3- The 
sequence: 

(a) 
(b) 

5 (c) 

(d) 
(e) 
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6. The moleciile of claim 4 which comprises in 
sequence: 

(a) an N-tefminal region derived from said 
heavy chain; 

(b) a peptide linker; and 

Cc) a C-terminal region derived from said 
light chain. 

7. The molecule of claim 3, 5 or S which r prior 
to said N-terminal region (a) , comprises a methionine 
residue. 

8* The molecule of claim 1 which is detectably 
labeled. 

9 - The molecule of -claim .L -.which is . in immobil- 
ized form. 

10. The molecule of claim 1 which is conjugated 
to an imaging agent. 

11 • The molecule of claim 1 which is conjugated 
to a toxin. 

12. A genetic sequence coding for the molecule of 
claim 1. 

13- A recombinant DNA (rDNA) molecule comprising 
the sequence of claim 12. 

14. The rDNA molecule of claim 13 which is a rep- 
licable cloning or expression vehicle. 
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15 • The rDNA molecule of claim 14 wherein said 
vehicle is a plasmid. 

16. A host cell transformed with the rDNA mole- 
cule of claim 13. 

.17." The host cell of claim 16 which is a bacter- 
ial cell/ a yeast or other fungal cell or a ma mma lian 
cell line in vitro . 

18. A method of producing a single polypeptide 
chain binding molecule which has binding specificity 
substantially similar to the binding specificity of 
the light and heavy chain aggregate variable region of 
an antibody, which comprisest 

(a) providing a genetic sequence coding for 
said molecule; ' ' 

(b) transforming a host cell with said se- 
quence; 

(c) expressing said sequence in said host; 
and 

(d) recovering said molecule. 

19. The method of claim 18 which further 
comprises purifying said recovered molecule. 

20. The method of claim 18 wherein said host cell 
is a bacterial cell, yeast or other fungal cell, or a 
mammalian cell line. 



21. The binding molecule' produced by the method 
of claim 18 or 19. 
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22- In an immunoassay method which utilizes an 
antibody in labeled form,, the improvement comprising 
using the molecule of * claim 8 instead of said anti- 
body. 

23. In an immunoassay method which utilizes am 
antibody in immobilized form^ the improvement compris- 
ing using the molecule of claim 9 instead of said an- 
tibody- 

24- In the immunoassay of claim 21 or 22 wherein 
said immunoassay is a competitive immunoassay. 

25. In the immunoassay of claim 21 or 22 wherein 
said immunoassay is a sandwich immunoassay. 

26- In an immuno therapeutic method which utilizes 
an antibody conjugated to a therapeutic agents the 
improvement comprising using the molecule of claim 1 
instead of said antibody. 

27. In a method of immunoaf f inity purification 
which utilizes an antibody therefor, the improvement 
which comprises using the molecule of claim 1 instead 
of said antibody. 
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5 10 ^5 20 

Glu Val GLn Leu Vai Glu Ser- Giv Giy Aso Leu Vai Lys Pra Giy Gly Ser Leu Lys Leu 
GAG GTG CAC CTG GTG GAG TCT GGG GGA GAG TTA GTG AAG CCT GGA GGG TCC CTG AAA CTC 
25 30 35 40 

Ser Cys Ala Ala Ser Gly Phe Thr Phe lie Ser Tyr Gly Met Ser Trp Val Arg Gin Thr 
TCC TGT GGA GCC TCT GGA TTC ACT TTC ATT AGC TAT GGC ATG TCT TGG GTT CGC CAG ACT 
45 50 55 60 

Pro Asn Lys Arg Leu Glu Trp Vai Ala Thr He Ser Ser Gly Ser Thr Tyr Thr Tyr Tyr 
CCA GAC AAG AGG CTG GAG TGG GTC GCA ACC ATT AGT AGT GGT AGT ACT TAC ACC TAG TAT 
65 70 7S aO 

Pro AsD Ser Val Lvs Giy Arc Phe Thr lie Ser Arg Asp Asn Ala Lys Asn Thr Leu Tyr 
CCA GAC AGT GTG aAG GGfe CEA TTC ACC ATC TCC AgX GAC AAT GCC AAG AAC ACC CTG TAc 
65 90 95 100 

L«u Gin Met Ser Gly Leu Lys Ser Glu Asp Thr Ala Met Tyr Tyr Cys Ala Arg Arg lie 
CTG CAA ATG AGC GGT CTG AAG TCT GAG GAC ACA GCC ATG TAT TAC TGT GCA AGA CGG ATT 
105 110 11S 120 

Thr Thr Val Val Leu Thr Asp Tyr Tyr Ala Met Asp Tyr Trp Gly Gin Gly Thr Ser Vai 
AGT ACG GTA GTA CTT ACG GAT TAG TAT GCT ATG GAC TAC TGG GGT CAA GGA ACC TCA GTC 
125 130 135 140 

Thr Vai Ser Ser Ala Lys Thr Thr Pro Pro Ser Vai Tyr ^ ^ ^ 

ACC GTC TCC TCA GCC AAA ACG ACA CCC CCA TCT GTC TAT CCA CTG GCC CCT GGA TCT GCT 
145 150 155 160 

Ala Gin Thr Asn Ser Met Vai Thr Leu Giy Cys Leu Val Lys Giy Tyr Phe Pro Glu Fro 
GCC CAA ACT AAC TCG ATG GTG ACC CTG GGA TGC CTG GTC AAG GGC TAT TTC CCT GAG CCA 
1B5 170 175 180 

Vai Thr Vai Thr Trp Asn Ser Gly Ser Leu Ser Ser Gly Vai His Thr Phe Pro Ala Val 
GTG ACA GTG ACC TTC AAC TCT GGA TCC CTG TCC AGC GGt GTG CAC ACC TTC CCA GCT GTC 
iaS 190 19S 200 

Leu Gin Ser Asp Leu Tyr Thr Leu Ser Ser Ser Vai Jhr Vai Pra Ser Ser Thr Trp Pro 
CTG CAG TCT GAC CTC TAC ACT CTG AGC AGC TCA. GTG ACT GTG CCC TCC AGC ACC TGG CCC 
205 210 215 220 

Ser Glu Thr Vai Thr Cys Asn Vai Ala His Pro Ala Ser Ser Thr Lys Val Asa Lys Lvs 
AGC GAG ACC GTC ACC TGC AAC GTT GCC CAC CCG GCg AGC ACC ACC AAG GTG GAC AAG AAA 
225 230 " * 235 240 

He Vai Pro Arg Asp Cys Giy Cys Lys Pro Cys lie Cys Thr Vai Pro Glu Val Ser Ser 
ATT GTG CCC AGG GAT TGT GGT TGT AAG CCT TGC ATA TGT ACA GTC CCA GAA GTA TCA TCT 
245 250 255 260 

Val Phe He Phe Pro Pro Lys Pro Lys Aso Val Thr IJ^ Thr ^ Vai 

GTC TTC ATC TTC CCC CCA AAG CCC AAG GAT GTG CTC ACC ATT ACT CTG ACT CCT AAG GTC 
2S5 270 275 2SO 

Thr Cys Vai Val Vai Asp lie Ser Lys Asp Asp Pro Glu Vai Gin 5?^ fer Trp Phe Val 
ACG TGT GTT GTG GTA GAC ATC AGC AAG GAT GAT CCC GAG GTC CAG TTC AGC TGG TTT GTA 
285 2S0 235 . 300 

Asp Asp Vai Glu Vai Hi's Thr Ala Gin Thr Gin Pro Arg Glu Glu Gin Phe Asn Ser Thr 
GA i GAT GTG GAG CTG CAC ACA GCT CAG ACG CAA CCC CGG GAG GAG CAG TTC AAC AGC ACT 
305 310 315 320 

Ser Arg Ser Vai Ser Glu Leu Pro lie Met His Gin Asp Trp Leu Asn Gly Lvs Glu Phe 
TCC CGC TCA GTC AGT GAA CTT CCC ATC ATG CAC CAG GAC TGG CTC AAT GGC AAG GAG TTC 
325 330 335 340 

Lys Cys Arg Vai Asn Ser Ala Ala Phe Pro Ala Pro He Glu Lys Thr lie Ser Lvs Thr 
AAA TGC AGS GTC AAC AGT GCA GCT TTC CCT GCC CCC ATC GAG AAA ACC ATC TCC AAA ACC 
345 350 355 360 

Lys Giy Ara Pro Lys Ala Pro Gin Vai Tyr Thr He Pro Pro Pro Lys Glu Gin Met Ala 
AAA 6GC AGA CCG AAG GCT CCA CAG GTG TAC ACC ATT CCA CCT CCC AAG GAG CAG ATG GCC 
365 370 375 3S0 

Lys Aso Lys Vai Ser Leu Thr Cys Met He Thr Asp Phe Phe Pro Giu Asp lie Thr Vai 
AAG GAT AAA CTC AGT CTG ACC TGC ATG ATA ACA GAC TTC TTC CCT GAA GAC ATT ACT GTG 
365 390 395 4O0 

Giu Trp Gin Trp Asn Gly Gin Pro Ala Glu Asn Tyr Lys Asn Thr Gin Arg He Met Asn 
GAG TGG CAG TGG AAT GGG CAG CCA GCG GAG AAC TAC AAG AAC ACT CAG CGC ATC ATG AAC 
405 410 415 42Q 

Thr Asn Gly Ser Tyr Phe Vai Tvr Ser Lys Leu Asn Vai Gin Lys Ser Asn Trp Glu Ala 
ACG AAT GGC TCT TAC TTC GTC TAC ACC AAG CTC AAT GTG CAG AAG AGC AAC TGG GAG GCA 

425 430 435 _ ^, 

Gly Asn Thr Phe Thr Cys Ser Vai Leu His Giu Gly Leu His Asn His His Thr Glu Lys 
GGA AAT ACT TTC ACC TGC TCT GTG TTA CAT GAG GGC CTG CAC AAC CAC CAT ACT GAG AAG 

445 

Ser Leu Ser His Ser Pro Gly Lys *** ^ I /• O 4 

AGC CTC TCC CAC TCT CCT GGT AAA TGA T I 0 . C 7 
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Glu Asn Vai Leu" Thr Gin Ser Pro Ala lie Met: Ser- Ala 
GAA AAT GTG CTC ACC CAG TCT CCA GCA ATC ATG TCT GCA 



30 



Met Thr Cys Arg 
ATG ACC TGC AGG 

Ser Gly Ala Ser 
TCA GGT GCC TCC 

Ala Arg Phe Ser 
GCT CGC TTC AGT 

Ala Glu Asp Ala 
GCT GAA GAT GCT 



5 10 
T>nr Gin Ser Pro Ala He Met: 
ACC CAG TCT CCA GCA ATC ATG 

.25 

Ala Ser 
GCC AGC 

45 
Pro Lys 
CCC AAA 



Ser Tyr 
AGT TAC 



Ala Gly Thr Lys Leu Glu Leu Lya^Arg Ala Asp Ala Ala 
GCT GGG ACC AAG CTG GAG CTG AAAJCGG GCT GAT GCT GCA 



Pro Ser Ser Glu 
CCA TCC AGT GAG 

Tyr Pro Lys Asp 
TAC CCC AAA GAC 



65 
Gly Ser 
GGC AGT 

85 
Ala Thr 
GCC ACT 

105 

Leu Glu 
CTG GAG 

125 

Gin Leu 
CAG TTA 

145 

He Asn 
ATC AAT 

165 

Thr Asp 
ACT GAT 



Ser Ser Val Ser Ser 
TCA AGT GTA AGT TCC 
50 

Leu Trp lie Tyr Ser Thr Ser 
CTC TGG ATT TAT AGC ACA TCC 
70 

Gly Ser Gly Thr Ser 
GGG TCT GGG ACC TCT 

90 

Tyr Tyr Cys Gin Gin Tyr Ser 
TAT TAC TGC CAG CAG TAG AGT 



Tyr Ser 
TAC TCT 



Leu Lys, Arg 
CTG AA^CGG 

Thr Ser Gly 
ACA TCT GGA 

Val Lys Trp 
GTC AAG TGG 



Leu Asn Ser Trp Thr Asp Gin Asp Ser Lys Asp Ser Thr 
CTG AAC AGT TGG ACT GAT CAG GAC AGC AAA GAC AGC ACC 



110 

Ala Asp 
GCT GAT 

130 

Gly Ala 
GGT GCC 

150 

Lys He 
AAG ATT 

170 

Lys Asp 
AAA GAC 



Ser Val 
TCA GTC 

Asp Gly 
GAT GGC 



185 190 
Asp Glu Tyr Glu Arg His Asn Ser Tyr 
GAC GAG TAT GAA CGA CAT AAC AGC TAT 



Thr Leu Thr Lys 
ATG TTG ACC AAG 

205 210 
Thr Ser Thr Ser Pro He Val Lys Ser Phe Asn Ar 
ACA TCA ACT TCA CCC ATT GTC AAG AGC TTC AAC AG 



Asn 
AAT 



Ser 
TCT 


15 
Pro 
CCA 


wi.y 
GGG 


nil 1 
GAA 


aXI 


VAI 
GTC 


■ 20 

Thr- 

I.I H 

ACC 


Leu 
TTG 


35 
His 
CAG 


Trp 
TGG 


me 
TTC 


Lain 
CAG 


R 1 m 

om 
CAG 


40 
Lys 
AAG 


Asn 
AAC 


55 
Leu 
TTG 


Ala 
GCT 


Ser 
TCT 


Gly 
GGA 


Val 
GTC 


60 
Pro 
CCT 


Leu 
CTC 


75 
Thr 
ACA 


He 
ATC 


Ser 
AGC 


Ser 
AGT 


Vai 
GTG 


80 
Glu 
GAG 


95 

Gly Tyr 
GGT TAG 


Pro 
CCA 


Leu 
CTC 


Thr 
ACG 


Phe 
TTC 


100 


Pro 
CCA 


115 
Thr 
ACT 


Val 
GTA 


Ser 
TCC 


He 
ATC 


Phe 
TTC 


120 
Pro 
CCA 


Val 
GTG 


135 
T& 


Phe 
TTC 


Leu 
TTG 


Asn 
AAC 


Asn 
AAC 


140 
Phe 
TTC 


Ser 
AGT 


155 
Glu 
GAA 


Arg 
CGA 


Gin 
CAA 


Asp 
AAT 


Gly 
GGC 


160 
Val 
GTC 


Tyr 
TAG 


175 
Ser 
AGC 


Met: 
ATG 


Ser 
AGC 


Ser 
AGC 


Thr 
ACC 


180 
Leu 
CTC 


Thr 
ACC 


195 


Glu 
GAG 


Ala 
GCC 


Thr 
ACT 


His 
CAG 


200 
AAG 


Glu 
GAG 


215 


TAG 
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10 
Ala 


He 


Met 


Sei- 


Ala 


CCA 


GCA 


ATC 


ATG 


TCT 


GCA 


"er 


3Q 
Val 
b 1 A 


Ser 

ML3 1 


Ser 
TCC 


Ser 
AGT 


Tyr 
TAC 


\sn 
AC 


50 
Gly 
GGG 


Gly 
GGA 


Asp 
GAC 


Leu 
TTA 


Val 
GTG 


'-ha 
TTC 


70 
Thr 
ACT 


Phe 
TTC 


He 
ATT 


Ser 
AGC 


Tyr 
TAT 




90 
Val 
GTC 


Ala 
GCA 


Thr 
ACC 


He 
ATT 


Ser 
AGT 


'rg 


110 
Phe 
TTC 


Thr 
ACC 


He 
ATC 


Ser 
TCC 


Arg 
AGA 


\AG 


130 
Ser 
TCT 


Glu 
GAG 


Asp 
GAC 


Thr 
ACA 


Ala 
GCC 


3AT 


150 
Tyr 
TAG 




Ala 
GCT 


Met 
ATG 


Asp 
GAC 


rcc 


170 
Asn 
AAC 


Leu 
TTG 


Ala Ser Gly 
GCT TCT GGA 


JT 


150 
Leu 
CTC 


Thr 
ACA 


He 
ATC. 


Ser 
AGC 


Ser 
AGT 


AGT 


aia 

Gly 
C^T 


Tyr 
TAG 


Pro 
CCA 


Leu 
CTC 


Thr 
ACG 
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15 
Sef- 

TCT 


Pro 
CCA 


Gly 
GGG 


Glu 
GAA 


20 

Lys Val 
AAG GTC 


35 
Leu 
TTG 


His 
CAC 


Trp 
TGG 


Phe 
TTC 


Gin 
CAG 


40 
Gin 
CAG 


55 

aXg 


Pro 
CCT 


Gly 
GGA 




Ser 
TCC 


60 
Leu 
CTG 


75 
Gly 
GGC 


Met 
ATG 


Ser 
TCT 


Trp 
TGG 


Val 
GTT 


SO 
CGC 


95 
Ser 
GGT 


Gly 
GGT 


Ser 
AGT 


Thr 
ACT 


Tyr 
TAC 


100 
Thr 
ACC 


115 
Asp 
GAC 


Asn Ala 
AAT'GCC 


Lys 
AAG 


Asn 
AAC 


120 
Thr 
ACC 


135 
Met 
ATG 


Tyr 
TAT 




^ 


Ala 
GCA 


140 
Arg 
AGA 


155 
Tyr 
TAC 


Trp 
TGG 




Pro 
CCG 


Gly 
GGT 


160 
Ser 
TCT 


175 
Val 
GTC 


Pro 
CCT 


Ala 
GCT 


Arg 
CGC 


Phe 
TTC 


180 
Ser 
AGT 


195 
Val 
GTG 


Glu 
GAG 


Ala 
GCT 


Glu 
GAA 


200 
AsD Ala 
GAT GCT 


215 
Phe 
TTC 


Gly 
GGT 


Ala 
GCT 


Gly 
GGG 


220 
Thr Lys 
ACC AAG 
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TFIY61 

sea rry 61 



Met 
ATG 


Giu 
GAA 


Asn 
AAT 


Val 
GTG 


5 

Leu 
CTC 


Thr 
ACC 


Gin 
CAG 


10 

Ser Pro Ala 
TCT CGA GCA 


He 
ATC 


Met 
ATG 


Ser 
TCT 


Ala 
GCA 


15 
Ser 
TCT 


Pro 
CCA 


Gly 
GGG 


Glu 
GAA 


20 

Lys Val 
AAG GTC 


Thr 
ACC 


Met: 
ATG 


Thr 
ACC 


Cys 
TGC 


25 
Arg 
AGG 


Ala 
■ GCC 


Ser 
AGC 


30 

Ser Ser Vai 
TCA AGT GTA 


Ser 
AGT 


Ser 
TCC 


Ser 
AGT 




3S 
Leu 
TTG 


His 
CAC 


TG§ 


Phe 
TTC 


40 

Gin Gin 
CAG CAG 


aXg 


Ser 
TCA 


Gly 
GGT 


Ala 
GCC 


45 
Ser 
TCC 


Pro 
CCC 


A^ 


SO 

Leu Trp lie Tyr 
CTC TGG- ATT TAT 


Ser 
AGC 


Thr 
ACA 


55 

Ser Asn 
TCC. AAC 


Leu 
TTC 


Ala 
GCT 


Ser 
TCT 


60 

Gly Vai 
GGA GTC 


Pro 
CCT 


Ala 
GCT 


Arg 
CGC 


Phe 
TTC 


65 
Ser 
AGT 


Gly 

GGi: 


Ser 
AGT 


70 

Gly Ser Gly- 
GGG TCT GGG 


Thr 
ACC 


Ser 
TCT 




Ser 
TCT 


75 
Leu 
CTC 


Thr 
ACA 


He 
ATC 


Ser 
AGC 


80 

Ser Vai 
AGT GTG 


Glu 
GAG 


Ala 
GCT 


Glu 
GAA 


Asp 
GAT 


85 
Ala 
GCT 


Ala 
GCC 


Thr 
ACT 


90 

Tyr Tyr Cys 
TAT TAC T&Z 


Gin 
CAG 


Gin 
TAC 


A^T 


. 95 
Ser Giy 
GGT 8GT 


Tyr 
TAC 


Pro 
CCA 


Leu 
CTC 


100 
Thr Phe 
ACG TTC 


Gly 
GGT 


105 

Ala Gly Thr Lys Val 
GCT GGG ACC AAG GTT 




110 

Gly Ser Pro Ala 
GGT- TCT- car GCA 


lie Asn 
ATC" AAC 


115 
Val Ala Val 
GTA* GCT GTA* 


His 

CA*cr 


Vai 
GTA 


120 
Phe Ser 
TTC TCT 


Gly 
GGG 


Gly 
GGA 


Asp 
GAC 


Leu 
TTA 


125 
Val 
GTG 


aXg 


Pro 
CCT 


130 

Gly Gly Ser 
GGA GGG TCC 


Leu 
CTG 


Lys 
AAA 


Leu 
CTC 


Ser 
TCC 


135 


Ala 
GCA 


Ala 
GCC 


Ser 
TCT 


140 
Gly Phe 
GGA TTC 


Thr- 
ACT 


Phe 
TTC 


145 

lie Ser Tyr 
ATT AGC TAT 


Gly 
GGC 


Met 
ATG 


150 

Ser Trp Vai 
TCT TGG GTT 


Arg 
CGC 


Gin 
CAG 


Thr 
ACT 


Pro 
CCA 


.155 
Asp 
GAC 


Lys 
AAG 


Arg 
AGG 


Leu 
CTG 


160 
Glu Trp 
GAG TGG 


Val 
GTC 


Ala 
GGA 


Thr 
ACC 


He 
ATT 


165 
Ser 
AGT 


Ser 
AGT 


170 

Gly Ser Thr Tyr 
GGT AGT ACT TAC 


Thr Tyr 
ACC TAC 


Tyr 
TAT 


Pro 
CCA 


175 
Asp 
GAC 


Ser 
AGT 


Val 
GTG 


Lys 
AAG 


1S0 
Gly Arg 
GGG CGA 


Phe 
TTC 


Thr 
ACC 


185 

lie Ser Arg Asp 
ATC TCC AGA GAC 


Asn 
AAT 


190 

Ala Lys Asn 
GCC AAG AAC 


Thr 
ACC 


Leu 
CTG 


tXc 


Leu 
CTG 


195 
Gin 
CAA 


Met 
ATG 


Ser 
AGC 


Gly 
GGT 


200 
Leu Lys 
CTG AAG 


Ser Glu Asp 
TCT GAG GAC 


Thr 
ACA 


205 
Ala 
GCC 


Met 
ATG 


Tyr 
TAT 


210 

Tyr Cys Ala 
TAC TGT GCA 


Arg 
AGA 


Arg 
CGG 


rie 

ATT 


Thr 
ACT 


215 
Thr 
ACG 


Vai 
GTA 


Vai 
GTA 


Leu 
CTT 


220 
Thr Asp 
ACG GAT 


Tyr Tyr 
TAG TAT 


Ala 
GCT 


Met 
ATG 


225 
Asp 
GAC 




230 

Trp Gly Thr Gly 
TGG GGT CAA GGA 


Thr 
ACC 


Ser 
TCA 


Vai 
GTC 


Thr 
ACC 


235 
Vai 
GTC 


Ser 
TCC 


TAA 
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5 10 15 20 

Me-t Glu Asn Vai Leu Thr Gin Ser- Pro Ala He Met Sen Ala Ser Pro Gly Glu Lys Val 
ATG GAA AAT GTG CTC ACC CAG TCT CGA GCA A7C ATG TCT GCA TCT CCA GGG GAA AAG GTC 
25 30 35 40 

Thr- Ma-c Thr Cvs Aro Ala Sen Ser Ser Val Ser Ser Ser Tyr Leu His Trp Phe Gin Gin 
loC AtG ACC rtC AgS GCC AGC TCA AGT GTA AGT TCC AGT TKC TTG CAC TGfe TTC CAG CAG 
45 50 -55 60 

Lvs Ser Gly Ala Ser Pro Lys Leu Trp He Tyr Ser Thr Ser Asn Leu Ala Ser Gly Val 
AAG TCA GGT GCC TCC CCC AAA CTC TGG ATT TAT AGC ACA TCC AAC TTC GCT TCT GGA GTC 
65 70 75 80 

Pro Ala Arg Phe Ser Gly Ser Gly Ser Gly Thr Ser Tvr Ser Leu Tbr lie S^ Ser Val 
CCT GCT CGC TTC AGT GGC AGT GGG TCT GGG ACC TCT TAC TCT CTC ACA ATC AGC AGT GTG 
85 90 55 100 

Glu Ala Glu Asp Ala Ala Thr Tyr Tyr Cys Gin Gin Tyr Ser Gly Tvr Pro Leu ]]ir Phe 
GAG GCT GAA GAT GCT GCC ACT TAT TAC TSC CAG TAC aST GGT GCt TAc CCA CTC ACG TTC 
• 105 110 i-IS 120 

Glv Ala Glv Thr Lvs Leu Lvs Glu Ser Gly Ser Val Ser Ser Glu Gin Leu Ala Gin Phe 
GG^ GCT GG^ ACC aXg CTG aXa GAA TCT GGT TCT GTT TCT TCT GAA CAG CTG GCT CAG TTT 
125 130 135 140 

Arg Ser Leu Asp Val Gin Leu Val Glu Ser Gly Gly Asp Leu Val Lys ^I? ^ly Gly S^ 
CGT TCT CTG GAT GTG CAG CTG GTG* GAG TCT GGG GGA GAC TTA GTG AAG CCT GGA GGG TCC 
145 150 155 160 

Leu Lys Leu Ser Cys Ala Ala Ser Gly Phe Thr Phe He Ser Tyr Gly Met Ser Trp Val 
CTG AAA CTC TCC TST GCA GCC TCT GGA TTC ACT TTC ATT AGC TAt GGC ATG TCT TGG GTT 
165 170 175 ISO 

Arg Gin Thr Pro Asp Lys Arg Leu Glu Trp Val Ala Thr He Ser Ser Gly Ser Thr Tyr 
CGC CGA ACT CCA GAC AAG AGG CTG GAG TGG GTC GCA ACC ATT AGT AGT GGT AGT ACT TAC 
165 130 195 200 

Thr Tyr Tyr Pra Asp Ser Val Lys Gly Arg Phe Thr He Ser Arg Asp Asn Ala Lys Asn 
ACC TAG TAT CCA GAC AGT GTG AAG GGG CGA TTC ACC ATC TCC AGA GAC AAT GCC AAG AAC 
205 210 215 220 

Thr Leu Tyr Leu Gin Met: Ser Gly Leu Lys Ser Glu Asp Thr Ala Met: Tyr Tyr Cys Ala 
ACC CTG TAC CTG CAA ATG AGC GGT CTG AAG TCT GAG GAC ACA GCC ATG TAT TAC TGT GCA 
225 230 235 240 

Arg Arg He Thr Thr Val Val Leu Thr Asp Tyr Tyr Ala Met Asp Tyr Trp Gly Gin Gly 
AGA CGG ATT ACT ACG GTA GTA CTT ACG GAT TAC TAT GCT ATG GAC tAc TGG GGT CAA GGA 

245 

Thr Ser Val Thr Val Ser »«* 

ACC TCA GTC ACC GTC TCC TAAr- 
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TRY04a 

SCA TRY 1048, ALL OF VL ANO VH 



Met 
ATG 


Glu 
GAA 


Asn 
AAT 


Vai 
GTG 


5 

Leu 
CTC 


Thr 
ACC 


Gin 
CAG 


Ser 
TCT 


Pro 
CCA 


10 
Ala 
GCA 


He 
ATC 


Met 
ATG 


Ser 
TCT 


Ala 
GCA 


15 
Ser 
TCT 


Pro 
CCA 


Gly 
GGG 


Glu 
GAA 


aXg 


Thr 
ACC 


Mei: 
ATG 


Thr 
ACC 


t2c 


25 
Arq 
AGG 

45 


Ala 
GCC 


Ser 
AGC 


Ser 


Ser 
AGT 


30 
Val 
GTA 

SO 


Ser 
AGT 


Ser 
TCC 


Ser 
AGT 


tJ£ 


35 
Leu 
AGT 

55 


His 
TAC 


Trp 
TGG 


Phe 
TTC 


Gin 
CAG 


Lys 
AAG 


Ser 
TCA 


Gly 
GGT 


Ala 
GCC 


Ser 
TCC 


Pro 
CCC 


Lys 
AAA 


Leu 
CTC 




He 
ATT 


Tyr 
TAT 


Ser 
AGC 


Thr 
ACA 


Ser 
TCC 


Asn 
AAC 


Leu 
TTG 


Ala 
GCT 


Ser 
TCT 


Gly 
GGA 


Pro 
CCT 


Ala 
GCT 


Arq 
CGC 


Phe 
TTC 


65 
Ser 
AGT 


Gly 
GGC 


Ser 
AGT 


Gly 
GGu 


Ser 
TCT 


70 
Gly 
GGG 


Thr 
ACC 


Ser 
TCT 


TXS 


Ser 
TCT 


75 
Leu 

(J 1 U 


Thr 
ACA 


He 
ATC 


Ser 
AGC 


Ser 
AGT 


Glu 
GAG 


Ala 
GCT 


Glu 
GAA 


Asp 
GAT 


as 

Ala 
GCT 


Ala 
GCC 


Thr 
ACT 


Tyr 
TAT 


Tyr 
TAC 


90 
TGC 


Gin 
CAG 


Gin 
CAG 


Tyr 
TAC 


Ser 
AGT 


95 


Tyr 
TAC 


Pro 
CCA 


Leu 
CTC 


Thr 
ACG 


Gly 
GGT 


Ala 
GCT 


Gly 
GGG 


Thr 
ACC 


105 

aXg 


Leu 
CTG 


Glu 
GAG 


Ala 
GCA 


Glu 
GAA 


110 
Gly 
GGC 


Thr Leu 
ACT CTG 


Ser 
TCT 


Pro 
CCA 


115 
Ala 
GCA 


Asp Lys 
GAT AAA 


Thr 
ACT 


Asn 
AAC 


aXa 


•a- 

^"^la 
GCA 


Ala 
GCA 


Trp 
TGG 


125 
Gly 
GGC 


Lys 
AAA 


Val 
GTT 


Met 
ATG 


Thr 
ACT 


130 
Gin 
CAG 


Leu 
CTG 


Val 
GTG 


Glu 
GAG 


Ser 
TCT 


135 
Gly 
GGG 


Gly 
GGA 


Asp Leu 
GAC TTA 


Val 
GTG 


Pro Gly 
CCT GGA 


Gly 
GGG 


Ser 
TCC 


145 

Leu Lys 
CTG AAA 


Leu 
CTC 


Ser 
TCC 


150 
Cys Ala 
TGT GCA 


Ala Ser Gly 
GCA TCT GGA 


Phe 
TTC 


155 
Thr 
ACT 


Phe 
TTC 


He Ser Tyr 
ATT AGC TAT 


Met 
ATG 


Ser 
TCT 


Trp' 
TGG 


Val 
GTT 


165 
Arg 
CGC 


Gin 
CAG 


Thr 
ACT 


Pro 
CCA 


170 
Asp Lys 
GAC AAG 


Arg Leu 
AGG CTG 


Glu 
GAG 


Trp 
TGG 


175 
Val 
GTC 


Ala 
GCA 


Thr 
ACC 


He 
ATT 


Ser 
AGT 


Gly 
GGT 


Ser 
AGT 


Thr 
ACT 


Tyr 
TAC 


185 

Thr Tyr 
ACC TAC 


Tyr 
TAT 


Pro 
CCA 


Asp 
GAC 


130 
Ser 
AGT 


Val Lys Gly Arg 
GTG AAG GGG CGA 


195 
Phe 
TTC 


Thr 
ACC 


He Ser Arg 
ATC TCC AGA 


Asn 
AAT 


Ala 
GCC 


Lys 
AAG 


Asn 
AAC 


205 
Thr 
ACC 


Leu 
CTG 


Tyr 
TAC 


Leu 
CTG 


Gin 
CAA 


210 
Mat 
ATG 


Ser 
AGC 


Gly 
GGT 


Leu Lys 
CTG AAG 


215 
Ser 
TCT 


Glu 
GAG 


Asa 
GAC 


Thr 
ACA 


Ala 
GCC 


Tyr 
TAT 


Tyr 
TAC 


?s? 


Ala 
GCA 


225 
Arg 
AGA 


Arg 
CGG 


He 
ATT 


Thr 
ACT 


Thr 
ACG 


230 
Val 
GTA 


Val 
GTA 


Leu 
CTT 


Thr 
ACG 


Asp 
GAT 


235 
Tyr 
TAC 


Tyr 
TAT 


Ala 
GCT 


Met 
ATG 


Asp 
GAC 










245 










250 


««< 


















Tra Glv 


Gin 


Gly 


Thr 


Ser 


Val 


Thr 


Val 


Ser 


















TGG 


GGT 


CAA 


GGA 


ACC 


TCA 


GTC 


ACC 


GTC 


TCC 


TAA 



















20 
Val 
GTC 

40 
Gin 
CAG 

60 

Val 
GTC 

80 
Val 
GTG 

100 
Phe 
TTC 

120 
Val 
GTT 

140 

aXg 

160 
Gly 
GGC 

180 
Ser 
AGT 

200 
Asp 
GAC 

220 
Met 
ATG 

240 
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